GRANT  NUMBER  DAMD17-94- J-4439 


AD 


TITLE:  Structural  Studies  of  the  PU.l  Transcription  Factor 

PRINCIPAL  INVESTIGATOR:  Kathryn  R.  Ely,  Ph.D. 

CONTRACTING  ORGANIZATION:  The  Burnham  Institute 

La  Jolla,  CA  92037 

REPORT  DATE:  October  1997 

TYPE  OF  REPORT:  Final 


PREPARED  FOR :  Commander 

U.S.  Army  Medical  Research  and  Materiel  Command 
Fort  Detrick,  Frederick,  Maryland  21702-5012 


DISTRIBUTION  STATEMENT:  Approved  for  public  release; 

distribution  unlimited 


The  views,  opinions  and/or  findings  contained  in  this  report  are 
those  of  the  author (s)  and  should  not  be  construed  as  an  official 
Department  of  the  Army  position,  policy  or  decision  unless  so 
designated  by  other  documentation. 


19980611  091 


STIC  QUALITY  INSPEci^jj  q 


1 


i 


i 


{ 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
0MB  No.  0704-0188 


Public  reporting  burden  for  this  collection  of  Information  is  estimated  to  average  1  hour  per  response.  Including  the  time  for  reviewing  instructions,  searching  existing  data  sources, 
gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regardino  this  burden  estimate  or  any  other  aspect  of  this 
collection  of  information,  including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  Tor  Information  Operations  and  Reports,  1215  Jefferson 
Davis  Highway,  Suite  1204,  Arlington,  vA  22202*4302,  and  to  the  Office  of  Management  and  Budget,  Paperwork  Reduction  Project  (0704-0188),  Washington,  DC  20503. 


1 .  AGENCY  USE  ONLY  (Leave  blank) 

2.  REPORT  DATE 

October  1997 

3.  REPORT  TYPE  AND  DATES  COVERED 

Final  (1  Sep  94  -  31  Aug  97) 

4.  TITLE  AND  SUBTITLE 

Structural  Studies  of  the  PU.1  Transcription  Factor 

5.  FUNDING  NUMBERS 

DAMP17-94-J-4439 

6.  AUTHOR(S) 

Kathryn  R.  Ely,  Ph.D. 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

The  Burnham  Institute 

La  Jolla,  CA  92037 

8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

Commander 

U.S.  Army  Medical  Research  and  Materiel  Command 

Fort  Detrick 

Frederick,  Maryland  21702-5012 

10.  SPONSORING/MONITORING 
AGENCY  REPORT  NUMBER 

1 1 .  SUPPLEMENTARY  NOTES 


12a.  DISTRIBUTION  /  AVAILABILITY  STATEMENT 


12b.  DISTRIBUTION  CODE 


Approved  for  public  release;  distribution  unlimited 


13.  ABSTRACT  (Maximum  200 


Ets  transcription  factors  play  a  role  in  development  and  are  implicated  in 
some  malignant  processes.  Recently,  members  of  this  large  gene  family 
have  been  identified  in  normal  gene  expression  in  mammary  cells  and 
also  in  breast  cancer  cell  lines.  In  these  studies,  the  crystal  structure  of  the 
DNA-binding  domain  of  the  PU.1  ets  protein  complexed  to  DMA  has  been 
determined  at  2.1  A  resolution.  The  DNA  binding  domain  is  a  conserved 
region  that  binds  the  core  sequence  5’-GGAA/T-3’.  The  PU.1  domain 
binds  DNA  using  a  loop-helix-loop  motif  involving  conserved  amino  acids 
and  bases.  In  this  project  we  have  also  used  nuclear  magnetic  resonance 
(NMR)  to  determine  the  unbound  structure  of  the  domain  in  solution.  The 
two  structures  were  correlated  to  understand  the  process  of  DNA 
recognition  by  ets  proteins. 


14.  SUBJECT  TERMS  Transcription  Factor,  Multidimensional  NMR, 
Protein-DNA  Complex,  X-Ray  Crystallography,  DNA-Binding 

Domain,  Breast  Cancer 

15.  NUMBER  ''F  PAGES 

33 

16.  PRICE  CODE 

17.  SECURITY  CLASSIFICATION 

18.  SECURITY  CLASSIFICATION 

19.  SECURITY  CLASSIFICATION 

20.  LIMITATION  OF  ABSTRACT 

OF  REPORT 

OF  THIS  PAGE 

OF  ABSTRACT 

Unclassified 

Unclassified 

Unclassified 

Unlimited 

NSN  7540-01-280-5500  Standard  Form  298  (Rev.  2-89) 

9  Prescribed  by  ANSI  Std.  239-18 

298-102 


FORF.WORn 


Opinions,  interpretations,  conclusions  and  recommendations  are 
those  of  the  author  and  are  not  necessarily  endorsed  by  the  U.S. 
Army . 

N/A  Where  copyrighted  material  is  quoted,  permission  has  been 
obtained  to  use  such  material. 

Where  material  from  documents  designated  for  limited 
distribution  is  quoted,  permission  has  been  obtained  to  use  the 
material . 

Citations  of  commercial  organizations  and  trade  names  in 
this  report  do  not  constitute  an  official  Department  of  Army 
endorsement  or  approval  of  the  products  or  services  of  these 
organizations . 

In  conducting  research  using  animals,  the  investigator  (s) 
adhered  to  the  "Guide  for  the  Care  and  Use  of  Laboratory 
Animals,"  prepared  by  the  Committee  on  Care  and  Use  of  Laboratory 
Animals  of  the  Institute  of  Laboratory  Resources,  National 
Research  Council  (NIH  Publication  No.  86-23,  Revised  1985) . 

N/A  For  the  protection  of  human  subjects,  the  investigator (s) 
adhered  to  policies  of  applicable  Federal  Law  45  CFR  46. 

N/A  In  conducting  research  utilizing  recombinant  DNA  technology, 
the  investigator (s)  adhered  to  current  guidelines  promulgated  by 
the  National  Institutes  of  Health. 

In  the  conduct  of  research  utilizing  recombinant  DNA,  the 
investigator (s)  adhered  to  the  NIH  Guidelines  for  Research 
Involving  Recombinant  DNA  Molecules. 

In  the  conduct  of  research  involving  hazardous  organisms, 
the  investigator (s)  adhered  to  the  CDC-NIH  Guide  for  Biosafety  in 
Microbiological  and  Biomedical  Laboratories. 


3 


TABLE  OF  CONTENTS 


Page 


Introduction 

6 

Body-Final  Report 

6 

Task1 

6 

Task  2 

7 

Task  3 

8 

Task  4 

14 

Conclusions 

15 

References 

16 

Appendix 

19 

4 


Annual  Report  -  Grant  DAM D1 7-94- J-4439 


5 


INTRODUCTION 


Significance.  The  ets  gene  family,  a  recently  discovered  family  of 
regulatory  proteins,  includes  more  than  45  members  in  a  variety  of 
organisms  from  Drosophila  to  humans  (Wasylyk  et  al.,  1993;  Moreau- 
Gachelin,  1994).  These  molecules  play  a  role  in  normal  development  and 
have  been  implicated  in  malignant  processes  such  as  leukemia  or  breast 
cancer.  Enthusiasm  is  quite  strong  for  the  study  of  ets  proteins  in  cancer 
research  (Hromas  and  Klemsz,  1994)  because  the  family  is  large  and 
composed  of  individual  members  that  are  distinct  and  function  as 
regulatory  proteins  in  a  variety  of  cell  types.  With  respect  to  breast 
cancer,  ets-related  proteins  have  been  identified  in  normal  mammary 
cell-specific  gene  expression  (Welte  et  al,  1994)  as  well  as  in  breast 
cancer  cell  lines  (Trimble  et  al.,  1993;  Slamon  et  al.,  1989;  Scott  et  al, 

1989).  An  interesting  association  of  ets  proteins  with  malignant 
transformation  has  been  suggested  in  the  observations  that  the 
phosphoprotein  osteopontin  is  regulated  by  ets-related  proteins  (Denhardt 
and  Guo,  1993;  Guo  et  al.,  1995).  Expression  of  this  protein  is  also 
responsive  to  hormones  such  as  estrogen  and  progesterone.  The 
expression  level  of  osteopontin  is  significantly  elevated  in  transformed 
cells  and  is  related  to  the  metastatic  potential  of  the  tumor  cells  (Guo  et 
al.,  1995;  Brown  et  al.,  1994;  Gardner  et  al.,  1994).  These  facts  suggest  a 
possible  mechanism  whereby  ets-related  proteins  may  be  implicated  in 
the  development  and/or  metastatic  spread  of  breast  tumors. 

Background.  The  PU.1  transcription  factor  is  an  ets  protein  expressed  in 
hematopoietic  cells  (Klemsz  et  al.,  1990).  The  ets  proteins  share  a 
conserved  domain  of  around  85  amino  acids  which  binds  as  a  monomer  to 
the  DNA  sequence:  5’-C/AGGAA/T-3'.  Within  the  ets  family,  the  PU.1 
sequence  is  the  most  divergent  from  ets-1  and  yet  there  is  40%  sequence 
homology  in  the  DNA-binding  domain  ofthese  two  proteins.  We  have 
selected  the  PU.1  ets  DNA-binding  domain  for  structural  studies  using 
both  crystallography  and  nuclear  magnetic  resonance  (NMR)  to  derive  the 
data.  The  crystallographic  analyses  were  focused  on  the  structure  of  the 
protein-DNA  complex  and  the  NMR  work  highlighted  the  domain  in  solution, 
to  evaluate  dynamic  aspects  of  the  structure. 

BODY  -  FINAL  REPORT 

Task  1:  Large  scale  purification  of  the  PU.1  DNA-binding  domain 

The  DNA-binding  domain  of  PU.1  was  cloned  in  the  pETII  expression 
vector  by  polymerase  chain  reaction  amplification  of  the  DNA-binding 
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domain  from  the  full  length  cDNA  as  described  previously  (Klemsz  et  al., 
1990).  For  bacterial  expression,  pET  plasmid  constructs  were  used  to 
transform  E.  coli  BL21(DE3)pLysS  cells.  The  protein  was  expressed  in 
large  scale  cultures  and  purified  by  ion-exchange  chromatography  and  gel 
filtration.  This  protein,  representing  residues  160-272,  was  used  for 
crystallization  in  complex  with  DNA.  Attempts  to  co-crystallize  a 
shorter  fragment  with  DNA  were  unsuccessful.  However,  for  NMR  analysis 
another  protein  fragment  was  generated.  This  short  fragment  of  93 
residues  was  highly  soluble  and  amenable  for  the  high  protein 
concentrations  required  for  the  NMR  experiments  outlined  in  Task  3.  The 
conclusion  from  the  protein  purification  work  is  that  length  of  the 
recombinant  fragment  is  extremely  critical  for  successful  crystallization 
of  the  protein-DNA  complex  (published  observations  in  Pio  et  al.,  1995) 
and  also  to  produce  a  tight  globular  domain  for  clear  resonance  dispersion 
in  NMR  analyses. 

Task  2:  Synthesis  of  DNA  oligonucleotides 

DNA  oligonucleotides  of  various  lengths  were  screened  for  binding  in 
complex  with  the  PU.1  domain.  The  quality  of  the  oligonucleotides  was 
critical  for  successful  co-crystallization.  Oligonucleotides  were 
synthesized  on  a  10  |j,M  scale  using  phosphoramidite  chemistry  and  an 
automated  DNA  synthesizer.  Oligos  were  purified  by  reverse  phase  HPLC 
at  56°C  in  an  acetonitrile  gradient.  After  removing  the  acetonitrile  by 
dialysis  against  triethylammonium  bicarbonate  buffer,  the 
oligonucleotides  were  desalted  in  ethanol  on  phosphocellulose  resin  and 
lyophilized.  For  the  series  of  oligonucleotides,  each  one  differed  in  length 
and  contained  the  core  sequence  GGAA  which  is  the  recognition  sequence 
for  PU.1.  Oligos  were  designed  to  provide  both  blunt-ended  duplex  DNA 
fragments  and  fragments  that  had  unpaired  T  or  A  bases  at  the  termini. 
The  latter  were  tested  because  they  have  the  potential  for  end-to-end 
stacking  in  the  crystal  lattice.  Ultimately  a  sixteen  base-pair 
oligonucleotide  with  the  sequence  5'-AAAAAGGGGAAGTGGG-3'  and  the 
complementary  strand  were  selected  and  synthesized  on  the  large  scale, 
purified  and  annealed  together  into  duplex  DNA.  This  oligo  promoted  the 
formation  of  crystals  of  the  complex  (results  published  in  Pio  et  al., 

1995).  It  was  evident  in  the  electron  density  map  of  the  complex  that  the 
DNA  fragments  formed  long  extended  fiber-like  elements  in  the  crystal 
lattice  by  end-to-end  stacking  between  adjacent  oligonucleotides,  and 
that  this  was  a  major  interaction  for  nucleation  of  crystal  growth. 
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Task  3:  Determination  of  the  soiution  structure  of  the  PU.1 
domain  by  NMR 

Work  with  a  short  fragment  of  the  domain,  produced  to  optimize  the  NMR 
studies,  permitted  the  production  of  doubly-labeled  and  sample 

for  the  unambiguous  assignment  of  all  resonances  in  the  93  residue 
domain.  In  heteronuclear  experiments  all  atoms  for  the  polypeptide 
backbone  have  been  assigned:  ^3Q(x,  and  Extension  to  the  side 

chain  resonances  is  beyond  70%  complete  at  this  stage.  More  than  850 
nuclear  Overhauser  effects  (NOE)  are  observed.  Each  NOE  represents  a 
uniquely  identified  interaction  between  two  protons  and  depends 
principally  on  the  distance  between  two  nuclei.  This  distance  analysis 
gives  information  on  the  three-dimensional  shape  of  the  molecule  in  its 
folded  conformation.  For  the  PU.1  domain,  an  average  of  nine  NOEs  per 
residue  were  observed.  Based  on  these  data,  all  secondary  structure 
elements  have  been  defined.  At  this  point,  we  do  not  see  large  differences 
between  the  free  PU.1  domain  in  solution  and  the  domain  bound  to  DNA  in 
the  crystal  structure.  However,  by  NMR,  structural  assignments  can  be 
made  for  residues  at  the  amino-  and  carboxyl-terminal  regions  that  do  not 
contact  DNA.  These  regions  were  disordered  or  flexible  in  the  crystal  and 
were  not  seen  in  the  electron  density  maps.  All  unambigous  assignments 
are  listed  in  Table  1. 

To  derive  assignments  for  the  polypeptide  backbone  and  to  define 
secondary  structural  elements  of  the  domain,  the  backbone  amide 
resonances  were  analyzed  in  Heteronuclear  Single  Quantum  Coherence 
(HSQC)  experiments.  In  the  folded  protein,  some  proton  are  protected  and 
shifts  result  from  protecting  the  protons  from  the  exchange  with  solvent. 
Thus,  using  experiments  designed  to  measure  the  chemical  shifts  in  a 
protein,  it  is  possible  to  deduce  information  about  the  secondary 
structural  features  of  the  molecules,  correlated  with  accessibility  to  the 
protein  surface  and  surrounding  solvent.  Figure  1  presents  the  NOE  and 
chemical  shift  data  for  the  PU.1  domain.  In  addition  to  the  expected 
protection  of  helical  and  p-strand  elements,  partial  protection  of 
residues  between  the  first  and  second  p-strands  was  observed,  as  well  as 
residues  at  the  carboxyl  terminus.  Interestingly,  amide  protons  in  helix 
a2  were  not  protected.  This  observation  may  be  due  to  increased  mobility 
in  the  time  regime  (milliseconds  to  hours)  measured  by  the  proton- 
deuterium  exchange.  However,  tryptophan  215  in  helix  3  does  contact  the 
DNA  backbone.  Other  structural  elements  that  were  seen  to  contact  the 
DNA  backbone  in  the  crystal  structure  of  the  protein-DNA  complex  were 
observed  to  be  mobile  by  hydrogen-deuterium  exchange;  for  example,  the 
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1.43,1.32 

N236 

123.8 

55.5 

37.7 

177.9 

7.88 

4.14 

2.87 

Y237 

122.1 

60.0 

38.6 

177.0 

7.74 

4.94 

3.53,3.23 

G238 

112.7 

47.1 

176.0 

8.23 

4.18 

K239 

123.2 

58.3 

32.7 

7.76 

4.34 

2.08,1.80 

1.66,1.57 

T240 

112.0 

62.7 

67.7 

177 . 1 

7.63 

4.00 

3.70 

G241 

114.1 

46.1 

182.5 

8.37 

4.40, 

3.65 

E242 

126.5 

61.9 

29.4 

176.4 

10.41 

3.78 

V243 

114.1 

60.4 

35.6 

182.2 

6.72 

4.70 

1.88 

0.83,0.75 

K244 

129.4 

54.2 

35.4 

182.9 

8.97 

4.80 

1.82  1.45 

i,1.40  1.71 

3  . 

25,3.13 

K245 

128.9 

56.0 

33.1 

176.3 

8.75 

4.01 

1.85  1.36 

;,1.06  1.67 

V246 

127.7 

60.3 

32.3 

183.0 

7.21 

4.39 

1.88 

0.79,0.57 

K247 

126.9 

57.7 

31.3 

176.5 

8.03 

4.13 

1.86  1.46 

1,1.38  1.73 

K248 

124.9 

56.1 

33.8 

175.7 

7.86 

4.23 

1.61  1.46,1.30 

K249 

128.0 

57.9 

32.5 

177.6 

8.30 

4.12 

1.80 

L250 

125.7 

56.1 

40.0 

181.4 

9.24 

3.91 

2.06,1.95 

1.63 

T251 

115.2 

62.1 

69.0 

172.4 

7.13 

4.97 

3.51 

0.93 

Y252 

129.9 

56.2 

43.9 

180.7 

8.36 

4.33 

3.03 

Q253 

122.7 

54.0 

33.4 

176.4 

9.06 

5.20 

1.84,1.58 

2.40,2.28 

F254 

131.3 

58.6 

41.5 

175.8 

8.93 

5.28 

3.39,2.96 

S255 

120.4 

57.3 

64.1 

175.4 

8.27 

4.47 

4.19 

G256 

113.8 

47.2 

176.6 

8.87 

3.98, 

3.82 

E257 

123.2 

59.0 

29.3 

178.4 

8.35 

4.11 

2.00 

2.30,2.13 

V258 

119.6 

63.1 

31.5 

176.4 

7.36 

4.03 

2.31 

0.86,0. 62 

L259 

123.2 

55.8 

42.9 

177.0 

7.32 

4.05 

1.74 

1.66 

G260 

116.0 

46.2 

7.44 

3.74 
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turn  in  the  helix-turn-helix  motif  and  the  'wing'  between  p-strands  3  and 
4.  Amide  protons  in  the  recognition  helix  3  were  less  protected  in  these 
studies  than  those  of  helix  1  which  does  not  contact  the  DNA.  The  results 
of  the  chemical  shift  experiments  correlate  well  with  the  crystal 
structure  and  suggest  that  no  major  structural  refolding  occurs  in  the 
domain  on  binding  DNA. 

To  probe  backbone  dynamics,  NMR  relaxation  techniques  were  performed. 
Because  motion  and  flexibility  can  markedly  influence  DNA  recognition  by 
DNA-binding  proteins,  such  measurements  can  provide  important  insight 
into  the  dynamics  of  the  protein-DNA  contacts.  Individual  quantitative 
measurements  of  the  dynamic  behavior  for  each  of  the  amide  proton 
resonances  from  the  polyppetide  backbone  were  made.  These 
measurements  probe  a  much  faster  time  regime  than  the  hydrogen- 
deuterium  experiments  and  can  provide  motional  information  in  the 
picosecond-nanosecond  scale  as  well  as  the  microsecond-millisecond 
time  regimes.  A  flexible  element  can  adopt  multiple  conformations  and 
thus  facilitates  binding  to  the  target  molecule  (in  this  case  DNA). 

Analyses  of  these  relaxation  data  to  date  are  presented  in  Figure  2. 
Relaxation  rates  (R1  and  R2)  and  NOE  intensities  indicate  a  higher 
intrinsic  flexibility  in  the  loop  between  helices  2  and  3.  We  reported  that 
this  connecting  segment  is  actually  a  loop  and  intermediate  in  length 
relative  to  the  counterpart  in  other  members  of  the  HTH  family  (Pio  et  al., 
1996).  In  PU.1,  this  loop  which  is  seven  residues  long,  contacts  DNA,  and 
has  the  lowest  NOE  intensities  and  relaxation  rates  of  the  domain. 


Figure  1:  Structural  elements  for  the  free  DNA-binding  domain  of  PU.1,  determined  by  NMR. 
The  sequence  of  the  domain  is  listed  on  the  top  line.  Underneath  the  sequence,  Lines  2-5  indicate 
the  sequential  NOE  correlations  (intensities).  Lines  6-7  present  chemical  shift  changes 
relative  to  the  corresponding  random  coil  values  for  Ca  and  Ha.  Amide  resonances  that  could  be 
detected  in  the  first  HSQC  spectrum  in  the  series  of^  H-^H  exchange  experiments  are  marked 
with  open  circles  in  Line  8,  while  residues  that  are  observable  after  several  hours  are  indicated 
by  filled  circles.  The  inferred  secondary  structural  elements  along  the  sequence  are  listed  at 
the  bottom  of  the  figure. 

Figure  2:  Relaxation  measurements  for  the  PU.1  domain.  NOE,  R1,  R2  and  values  are 
plotted  for  each  of  the  residues  in  the  domain.  R1  and  R2  are  relaxation  rates  and  is  an  order 
parameter  that  gives  a  measure  of  the  rigidity  of  an  element.  Values  of  >  0.85  indicate 
greater  conformational  restriction.  Note  that  the  values  were  considerably  low  for  three 
loops:  the  loop  between  helix  2  and  3  which  is  the  DNA  recognition  helix,  the  loop  between  p- 
strand  3  and  4  which  contacts  the  DNA  and  the  loop  between  helix  3  and  p-strand  3  (no  DNA 
contacts).  In  R2,  the  asterisk  indicates  that  the  residue  Ml 87  has  a  weak  crosspeak  in  the'*5N- 
HSQC  spectrum.  In  S^,  two  asterisks  indicate  two  residues  which  are  not  resolved. 
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R2  (1/s)  R1  (1/s)  NOE  intensities 


Task  4:  Determination  of  the  crystal  structure  of  the  PU.1 
domain  complexed  to  DNA 

The  PU.1 -DNA  complex  crystallized  in  the  space  group  C2  with  a=  89.1, 
b=101.9,  c=55.6  A  and  p=111.2°,  with  two  complexes  in  the  asymmetric 
unit.  Crystal  growth  was  induced  from  solutions  containing  cacodylate 
buffer  at  pH  6.5,  zinc  acetate  and  polyethyleneglycol  600  (published  data 
in  Pio  et  al.,  1995).  The  crystals  diffracted  to  2.3  A  resolution  (2.1  A  at 
the  LURE  synchrotron).  Four  heavy  atom  derivatives  were  prepared  by 
soaking  crystals  in  mercury  compounds  and  by  co-crystallizing  with 
iodinated  oligonucleotides.  The  structure  was  solved  by  the  multiple 
isomorphous  replacement  method  plus  anomalous  scattering  from  the 
mercury  compound  (MIRAS).  An  atomic  model  fitted  to  electron  density 
maps  calculated  at  2.3  A  resolution  (2.1  A  refined)  revealed  the  structure 
of  the  complex  and  was  reported  (Kodandapani  et  al.,  1996).  This  is  the 
first  (and  only)  report  of  a  crystal  structure  of  an  ets  protein. 

The  PU.1  domain  assumes  a  tight  globular  structure  (33  x  34  x  38  A3) 
formed  by  three  a-helices  and  a  four-stranded  antiparallel  p-sheet.  The 
domain  topology  is  similar  to  the  structure  of  other  ets  family  proteins 
fii-1  (Liang  et  al.,  1994),  murine  ets-1  (Donaldson  et  al.,  1996),  and  human 
ets-1  (Werner  et  al.,  1995)  determined  in  solution  by  NMR.  The  structures 
revealed  that  ets  domains  share  a  common  folding  pattern  that  is  similar 
to  a+p  helix-turn-helix  (HTH)  DNA-binding  proteins  and  resembles  'winged' 
HTH  proteins  such  as  HNF-3y  (Clark  et  al.,  1993).  The  domain  contacts 
DNA  from  three  sites:  the  recognition  helix  (a3),  the  loop  between  p- 
strands  3  and  4  (  a  'wing'),  and  the  turn  in  the  HTH  motif  (a2-turn-a3). 

This  turn  is  longer  that  the  equivalent  in  many  other  HTH  proteins  and  is 
actually  a  loop.  Our  structure  revealed  a  new  pattern  for  HTH  recogntion 
and  a  novel  mode  of  DNA  binding  (reported  in  Kodandapani  et  al.,  1996). 

The  DNA  is  bent  in  the  complex  (8°)  when  compared  to  'canonical'  B-DNA 
structure  and  is  curved  uniformly  along  the  entire  16  bp  length.  The  minor 
groove  is  slightly  enlarged  (-2-3  A  from  the  mean)  in  the  GGAA  region  at 
the  midpoint  of  the  oligonucleotide.  There  was  a  report  that  a  human  ets- 
1-DNA  complex  was  quite  different  where  the  protein  contacted  the  DNA 
(Werner  et  al.,  1995)  with  a  kinked  deformation  of  the  DNA  by  60°, 
however,  this  model  was  found  to  be  in  error  due  to  a  misinterpretation  of 
the  NMR  data  in  their  structure  solution.  The  authors  of  that  structure 
retracted  the  model  (Werner  et  al.,  1996  erratum)  and  reported  that  the 
ets-1 -DNA  complex  was  in  fact  quite  similar  to  our  PU.1 -DNA  structure. 
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Four  strictly  conserved  residues  on  the  surface  of  the  domain  are  likely  to 
be  important  for  DNA-binding  by  all  members  of  the  ets  family.  Arg  232 
and  Arg  235  emanate  from  the  recognition  helix  and  contact  the  conserved 
GGAA  sequence  in  the  major  groove  of  DNA.  Lys  245  from  the  'wing' 
contacts  the  phosphate  backbone  of  the  GGAA  strand  in  the  minor  groove 
upstream  from  the  core  sequence  and  Lys  219  in  the  loop  of  the  HTH  motif 
forms  a  salt  bridge  with  the  phosphate  backbone  of  the  opposite  strand 
downstream  of  the  GGAA  core.  Substitutions  of  glycine  at  each  of  these 
four  conserved  sites  abolished  DNA  binding,  confirming  the  functional 
importance  of  these  residues.  These  interactions  were  further  evaluated 
by  mutagenesis  of  PU.1  and  comparisons  of  mutagenesis  on  other  ets 
molecules.  We  reported  (Pio  et  al.,  1996)  that  these  interactions 
represent  the  paradigm  for  ets  recognition  which  is  expected  to  be 
reproduced  in  all  ets  proteins. 

DNA  bending  that  is  stabilized  by  the  PU.1  domain  may  serve  as  an 
illustration  of  the  hypothesis  of  DNA  bending  by  phosphate  neutralization. 
It  has  been  demonstrated  that  when  neutral  methylphosphonates  are 
introduced  into  DNA  fragments,  bending  of  the  DNA  occurs  due  to  repulsion 
of  the  remaining  anionic  phosphates  (Strauss  and  Maher,  1994).  They 
proposed  that  binding  of  a  cationic  protein  to  DNA  could  have  the  same 
effect  and  it  appears  that  PU.1  induces  and  stabilizes  this  type  of  bending 
in  the  DNA.  There  are  seven  sites  of  phosphate  neutralization  in  the  PU.1 
DNA  complex,  on  one  face  of  the  DNA  helix. 

Conclusions 

The  work  accomplished  in  this  project  has  been  a  significant  contribution 
to  our  understanding  of  the  way  that  ets  proteins  recognize  DNA.  We  have 
successfully  produced  the  first  crystal  structure  of  an  ets  protein  and  the 
model  will  serve  as  the  basis  to  begin  to  describe  the  atomic  detail  for 
protein-DNA  contacts.  The  contact  regions  have  been  evaluated  by 
measurements  in  solution  by  NMR  indicating  structural  features  where 
molecular  dynamics  may  contribute  to  DNA  recognition.  This  study  of  the 
PU.1  molecule  is  the  first  study  of  an  ets  molecule  where  both  crystal 
data  and  NMR  solution  data  are  produced. 
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Transcription  factors  belonging  to  the  ets  family  reg¬ 
ulate  gene  expression  and  share  a  conserved  ETS  DNA- 
binding  domain  that  binds  to  the  core  sequence  5'-(C/ 
A)GGA(A/T)-3'.  The  domain  is  similar  to  a+p  (‘^winged”) 
helix-tum-helix  DNA-binding  proteins.  The  crystal 
structure  of  the  PU.l  ETS  domain  complexed  to  a  16- 
base  pair  oligonucleotide  revealed  a  pattern  for  DNA 
recognition  from  a  novel  loop-helix-loop  architecture 
(Kodandapani,  R.,  Pio,  F.,  Ni.  C.-Z.,  Piccialli,  G.,  Klemsz, 
M.,  McKercher,  S.,  Maki,  R.  A.,  and  Ely,  K.  R.  (1996) 
Nature  380,  456-460).  Correlation  of  this  model  with 
mutational  analyses  and  chemical  shift  data  on  other  ets 
proteins  confirms  this  complex  as  a  paradigm  for  ets 
DNA  recognition.  The  second  helix  in  the  helix-tum- 
helix  motif  lies  deep  in  the  major  groove  with  specific 
contacts  with  bases  in  both  strands  in  the  core  sequence 
made  by  conserved  residues  in  a3.  On  either  side  of  this 
helix,  two  loops  contact  the  phosphate  backbone.  The 
DNA  is  bent  (8*)  but  uniformly  curved  without  distinct 
kinks.  ETS  domains  bind  DNA  as  a  monomer  yet  make 
extensive  DNA  contacts  over  30  A.  DNA  bending  likely 
results  htim  phosphate  neutralization  of  the  phosphate 
backbone  in  the  minor  groove  by  both  loops  in  the  loop- 
helix-loop  motif.  Contacts  hrom  these  loops  stabilize 
DNA  bending  and  may  mediate  specific  base  interac¬ 
tions  by  inducing  a  bend  toward  the  protein. 


Transcription  factors  bind  to  tso-get  DNA  sequences  to  regu¬ 
late  metabolic  functions  such  as  growth  and  differentiation. 
Typically,  the  molecular  scaffold  for  DNA  recognition  is  con¬ 
served  within  a  given  family  of  DNA-binding  proteins.  In  some 
cases  the  similarity  of  these  scaffolds  suggests  an  evolutionary 
relationship  between  different  families  or  comparison  of  scaf¬ 
folds  reveals  a  structural  similarity  that  was  obscured  by  se¬ 
quence  comparisons  alone. 

A  recently  discovered  family  of  regulatory  proteins,  the  ets 
gene  family,  includes  more  than  45  members  in  a  variety  of 
organisms  from  Drosophila  to  humans  (1,  2).  These  molecules 
play  a  role  in  normal  development  and  have  been  implicated  in 
malignant  processes  such  as  erythroid  leukemia  and  Ewing's 
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sarcoma.  The  DNA-binding  domain  of  ets  proteins  is  a  con¬ 
served  region  (ETS  domain)  that  is  about  85  residues  in  length. 
Although  ets  proteins  share  a  homologous  sequence  in  the  ETS 
domain,  they  differ  in  length  and  in  the  relative  position  of  this 
domain.  In  some  molecules,  the  ETS  domain  is  found  at  the 
carboxyl  terminus  (e.g.  PU.l  (3);  ets-1  (4);  ets-2  (5)),  while  in 
others  the  domain  is  located  in  the  middle  of  the  sequence  (erg 
(6)),  or  in  the  amino-terminal  region  (elk-1  (7)).  Flanking  re¬ 
gions  are  thought  to  form  other  functional  domains  that  influ¬ 
ence  protein-protein  recognition  or  inhibitory  domains  that 
mask  the  DNA-binding  site  (8,  9).^  In  ets-1,  an  a-helix  that  is 
located  in  an  inhibitory  domain  immediately  NH2-terminal  to 
the  ETS  domain  unfolds  on  DNA-binding  (10).  Regardless  of 
the  position  of  the  ETS  domain  within  the  intact  ets  proteins, 
there  is  strong  sequence  homology  in  this  conserved  region. 

We  have  determined  the  crystal  structure  of  the  ETS  domain 
of  the  PU.l  transcription  factor  complexed  to  DNA  (11).  The 
domain  is  similar  to  a+^  helix-tum-helix  (HTH)^  DNA-binding 
proteins  and  contacts  a  10-base  pair  region  of  duplex  DNA  that 
is  bent  (8°)  but  uniformly  curved  without  distinct  kinks.  The 
PU.l  domain  assumes  a  tight  globular  structure  with  three 
a-helices  and  a  four-stranded  antiparallel  j3-sheet  enclosing  a 
hydrophobic  core.  The  topology  of  the  domain  is  similar  to  the 
stmctures  of  other  ets  family  proteins  fli-1  (12),  murine  ets-1 
(13),  and  human  ets-1  (14)  determined  in  solution  by  NMR.  The 
common  molecular  scaffold  is  similar  to  DNA-binding  proteins 
such  as  CAP  (15)  and  resembles  “winged”-HTH  proteins  includ¬ 
ing  HNF-3y  (16).  ETS  domains  bind  as  a  monomer  to  the  core 
sequence  5'-(C/A)GGA(A/T)-3'. 

The  PU.l  domain  contacts  DNA  from  three  sites:  the  recog¬ 
nition  helix  (a3)  interacts  with  the  GGAA  core  sequence  in  the 
major  groove,  while  contacts  with  the  phosphate  backbone  on 
either  side  of  this  site  are  made  in  the  minor  groove  by  two 
loops.  Therefore,  the  PU.l  ETS  domain  binds  DNA  by  a  loop- 
helix-loop  motif.  One  loop  is  formed  between  0-strands  3  and  4 
(a  “wing”)  and  the  other  is  a  loop  in  the  position  of  the  turn  in 
the  HTH  motif  (a2-tum-a3).  The  protein-DNA  contacts  stabi¬ 
lize  a  uniform  bending  of  the  duplex  DNA  that  likely  is  due  to 
phosphate  neutralization  by  the  PU.l  domain.  Surprisingly, 
the  protein-DNA  interactions  reported  in  the  NMR  structure  of 
a  human  ets-l-DNA  complex  (14)  differed  dramatically  from 
this  pattern,  involving  different  contacts  and  significant  DNA 
deformation.  Because  of  this  discrepancy,  we  chose  to  test  the 
validity  of  the  PU.l -DNA  complex  as  a  model  for  other  ets 
proteins.  As  reported  here,  when  the  results  of  mutational 
analyses  on  a  number  of  ets  proteins  are  correlated  with  the 
structure  of  the  PU.l-DNA  complex  and  with  chemical  shift 
data  measured  with  the  fli-1  (12)  and  murine  ets-1  (13)  mole- 


^  M.  Klemsz  and  R.  A.  Maki,  unpublished  resxiits. 
^  The  abbreviation  used  is:  HTH,  helix-tum-helix. 
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Table  I 


,  Crystallographic  refinement  statistics 

(%) 

3.3 

Resolution  range  (A) 

&-2.1 

Average  B  (A^) 

31.65 

Crystallographic  R-factor  (%) 

22.5 

Rpree  (%> 

28.7 

Number  of  reflections  used 

22022  F  >2a<Fi 

Number  of  protein  atoms 

1486 

Number  of  DNA  atoms 

1300 

Number  of  solvent  atoms 

143 

Root  mean  square  deviation  from  ideal 

r.m.s. 

Target 

Bond  distance  (A) 

0.012 

(0.06) 

Bond  angles  (degrees) 

1.629 

(10) 

Dihedral  angles  (degrees) 

1.575 

(20) 

exiles,  the  loop-helix-loop  scaffold  is  confirmed  as  a  general 
model  for  DNA  recognition  by  ets  proteins.  This  pattern  defines 
a  new  class  of  HTH  DNA-binding  proteins.  The  molecular 
pattern  of  DNA  recognition  by  ets  proteins  is  compared  to  other 
HTH  proteins  for  which  crystal  structures  of  the  protein-DNA 
complexes  are  available. 

EXPERIMENTAL  PROCEDURES 

PUJ  DNA  Complex— A  recombinant  fragment  encompassing  resi¬ 
dues  160-272  from  the  murine  ets  protein  PU.l  was  crystallized  in 
complex  with  a  16-base  pair  oligonucleotide  representing  a  consensus 
PU.l  DNA-binding  site  (3)  as  described  previously  (17).  The  complex 
crystallized  in  space  group  C2  with  a  =  89,1,  6  =  101.9,  c  =  55.6  A,  and 
/3  =  111.2®.  There  are  two  complexes  in  the  asymmetric  unit.  The  length 
of  the  oligonucleotide  was  critical  for  crystallization  and  the  oligonu¬ 
cleotide  used  to  form  the  complex  permitted  end-to-end  stacking  of  the 
DNA  in  the  crystal  lattice  with  the  formation  of  pseudo-base  pairing  by 
the  overhanging  A  and  T  bases. 

Crystallographic  Analyses— The  initial  structure  analysis  of  the  com¬ 
plex  solved  by  the  MIRAS  method  was  reported  (11).  For  this  first  phase 
of  the  study,  a  native  data  set  and  four  heavy  atom  data  sets  were 
collected  using  a  Rigaku  RU200  rotating  anode  x-ray  source  and  two 
San  Diego  Muitiwire  Systems  area  detectors.  The  initial  data  sets  were 
collected  from  Hash  frozen  crystals  at  2.3-A  resolution.  To  refine  the 
structure  further,  another  native  data  set  extending  to  2.1  A  was 
collected  at  the  LURE  synchrotron  source  in  Orsay,  France.  Diffraction 
data  were  collected  at  station  D41  interfaced  with  the  Mark  III  multi¬ 
wire  proportional  area  detector.  Data  sets  were  processed  using  MOS- 
FLM  (18)  and  ROTAVATA,  AGROVATA,  and  TRUNCATE  in  the  CCP4 
package  (19).  In  the  present  study,  this  native  data  set  was  scaled  to  the 
data  collected  in  the  home  laboratory  by  Wilson  scaling  and  the  syn¬ 
chrotron  data  were  incorporated  into  the  refinement.  The  programs 
PHASES  (20),  FRODO  (21),  and  X-PLOR  (22)  were  used  for  structure 
solution,  model  building,  and  refinement.  The  current  R-factor  is  22.5 
for  6  to  2.1  A  data  (22,022  reflections).  The  average  overall  J5-factor  for 
2929  non-hydrogen  atoms  (1486  protein  atoms  +  1300  DNA  atoms  + 
143  solvent  oxygens)  is  31.6  A’^.  The  refinement  statistics  are  presented 
in  Table  I.  There  were  1 1  disordered  residues  at  the  amino  terminus  of 
the  domain  and  14  disordered  residues  at  the  carboxyl  terminus  of  the 
recombinant  fragment  that  were  excluded  from  the  model.  These  resi¬ 
dues  were  not  ordered  even  when  the  resolution  was  extended  to  2.1  A, 
For  all  residues  representing  the  complete  ETS  domain  (residues  171- 
258),  the  electron  density  was  clear  and  permitted  unambiguous  fitting 
of  both  backbone  and  side  chain  atoms.  More  solvent  atoms  have  been 
added  to  the  model.  Only  minimal  changes  in  the  configuration  of  some 
side  chains  were  evident  in  the  high  resolution  map.  The  stereochem¬ 
istry  of  all  main  chain  torsion  angles  in  the  domain  fall  within  energet¬ 
ically  favorable  limits  (Fig.  1)  indicating  that  no  segment  of  the  domain 
is  denatured  or  randomly  configured.  The  DNA  was  clearly  defined  even 
in  the  first  MIRAS  map. 

Analyses  of  DNA  Helical  Parameters  — To  analyze  the  stereochemical 
basis  for  the  uniform  bending  observed  in  the  oligonucleotide  bound  in 
complex  to  PU.l,  the  DNA  superstructure  was  measured  (23,  24)  and 
four  parameters  were  calculated  that  describe  the  conformation  of  the 
DNA  bases  and  the  phosphate  backbone.  The  values  were  calculated 
(excluding  the  5'  A  overhang)  to  analyze  helical  parameters  along  the 
length  of  the  oligonucleotide  and  to  compare  these  with  standard  B- 
DNA  parameters.  The  geometry  of  dinucleotide  steps  was  analyzed  for 
three  rotational  angles  defining  twist,  tilt,  or  roll  and  for  one  transla- 


Phi  (degrees) 

Fig.  1.  Ramachandran  diagram  for  the  current  model  of  the 
PU.l  ETS  domain.  This  diagram  presenting  angles  (46)  was  pro¬ 
duced  using  the  PROCHECK  programs  (47).  Glycine  residues  are  rep¬ 
resented  by  triangles.  Various  regions  of  the  plot  with  different  levels  of 
shading  are  indicated  with  the  darkest  shaded  areas  corresponding  to 
the  energetically  most  favorable  4>il/  angles. 

tional  distance,  i.e.  rise.  The  values  for  these  parameters  are  presented 
in  Table  II. 

Sequence  Alignments  and  Structural  Comparisons  —  Sequence  align¬ 
ments  for  ets  proteins  were  made  using  GEJ^WORKS.  The  individual 
sequences  were  collected  from  the  SWISSPROT  data  base  and  regions 
corresponding  to  the  ETS  domains  were  excised  from  the  full-length 
protein  before  the  alignment  process  began  (25).  The  results  of  this 
comparison  are  presented  in  Fig.  2.  Sequence  comparisons  between 
members  of  different  families  of  HTH  proteins  were  made  using  the 
program  QUANTA  (Molecular  Simulations,  Inc.)  especially  when  struc¬ 
ture-based  alignments  were  utilized.  To  search  structure  data  bases  to 
identify  proteins  with  similar  overall  scaffolds  to  the  PU.l  domain,  the 
algorithm  DALI  developed  by  Chris  Sander  (26)  was  used.  For  struc¬ 
tural  comparisons  of  HTH  proteins,  coordinates  were  obtained  from  the 
Brookhaven  Protein  Data  Bank  (27):  434  cro  repressor  (code  3CRO),  A 
repressor  (code  ILMB),  CAP  (code  ICGP),  and  heat  shock  factor  (code 
2HTS).  The  coordinates  for  HNF-Sy  were  kindly  provided  by  Dr.  S. 
Burley.  The  actual  structural  comparisons/graphical  analyses  were  per¬ 
formed  using  Quanta  (Molecular  Simulations,  Inc.)  and  the  Alberta/ 
Caltech  program  TOM  based  on  FRODO  (21). 

RESULTS  AND  DISCUSSION 

The  similarity  of  the  structural  organization  of  the  ETS 
domains  of  PU.l  (11),  fli-1  (12),  and  ets-1  (13,  14)  and  the 
presence  of  a  conserved  hydrophobic  core  suggests  that  this 
overall  scaffold  will  be  highly  conserved  in  all  members  of  the 
family.  To  facilitate  comparisons,  the  sequences  of  the  ETS 
domains  of  33  members  of  the  ets  family  are  aligned  (Fig.  2). 
The  sequences  of  this  domain  in  a  number  of  ets  proteins  are 
identical  for  two  or  more  species,  representing  a  significant 
level  of  homology  within  the  family.  The  results  of  mutational 
substitutions  in  a  number  of  ets  proteins  are  tabulated  in 
Table  III. 

Hydrophobic  Core  — The  importance  of  the  hydrophobic  core 
was  verified  by  site-directed  mutagenesis  of  the  PU.l  domain 
(11).  Of  the  14  strictly  conserved  residues  in  the  domain,  seven 
are  found  in  the  hydrophobic  core.  Single  substitution  of  gly¬ 
cine  for  five  of  these  residues  in  PU.l  (Fig.  3)  resulted  in  loss  of 
DNA  binding.  Two  of  these  core  residues  also  contact  the  DNA 
phosphate  backbone.  The  peptide  amide  nitrogen  of  Leu^^^ 
interacts  vrith  02P  of  C-22  and  the  side  chain  NE-1  of  Trp^^^ 
forms  a  hydrogen  bond  with  OIP  from  T-23.  Mutation  of  tryp- 
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Table  II 

DNA  helical  parameters  of  the  16-base  pair  oligonucleotide  bound  to  PUJ 


DNA  structural  parameters  were  refined  in  X-PLOR  (22)  and  then  analyzed  using  the  programs  developed  by  Babcock  and  Olson  (24).  For 
comparison,  typical  twist  angles  for  B-DNA  are  34.3®,  roll  angles  are  0®,  and  rise  values  are  3.38  A. 


Base 

pair 

Inter-base  pair 

Slide 

(A) 

Intra-base  pair 

Helical 
twist  (°) 

Roll  n 

Rise 

(A) 

Propeller 
twist  (“) 

Buckle 

n 

1 

A-T 

36.09 

-0.06 

3.18 

0.13 

-18.22 

11.26 

2 

A-T 

39.67 

-0.48 

3.23 

0.06 

-16.18 

13.88 

3 

A-T 

34.10 

-6.19 

3.30 

-0.71 

-14.38 

1.39 

4 

A-T 

36.09 

0.76 

3.20 

-1.00 

-17.25 

3.38 

5 

G-C 

27.50 

6.59 

3.57 

-0.75 

-8.02 

2.87 

6 

G-C 

29.06 

7.29 

3.11 

0.24 

2.10 

-17.28 

7 

G-C 

33.02 

3.96 

3.47 

0.59 

7.77 

5.90 

8 

G-C 

27.44 

6.75 

3.21 

-0.37 

-14.98 

7.43 

9 

A-T 

36.02 

9.00 

3.10 

0.16 

-21.87 

10.59 

10 

A-T 

39,21 

3.24 

3.35 

-0.61 

-19.29 

5.58 

11 

G-C 

24.41 

-0.84 

3.38 

-0.94 

-13.13 

-10.09 

12 

T-A 

37.07 

3.75 

3.29 

0.98 

-12.30 

-7.49 

13 

G-C 

32.70 

9.93 

3,30 

-0.14 

-6.40 

3.66 

14 

G-C 

33.29 

4.99 

3.23 

-0.02 

-10.53 

-8.09 

15 

G-C 

-9.96 

-2.26 
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d-QDGA-RSSC] 
i-DDPS-NSHF] 


1I--EPEE] 
JK--DPD] 
fK|TNREKGV|lK  jV— DSKA|S; 
JIkCRDTGVI  KlV--DPAG^: 
•GNS  -  RE  S  QBC  -  -  PDKE1|AR: 
'GRG-MEIKil--EPE] 


03  03  04 

iKiA^E^N  GKTGEBK  ^-KKL-' 
iKLSpjAliflY  YNKRItjH  JKGKR-F' 

ipN  aktgeSr  j K-RKL 

Y  YDKNIJH  AGKR-' 

Y  YDKNI^H  SGKR-YVj 

Y  YDKNIIH  SGKR-YV1 

Y  YDKNI^  AGKR-Yvj 

Y  ydkni|h  AGKR-YVI 

Q  ykkgi|k  ERSQRLV  Q  C 

Y  YDKNlilT  HGKR-YA  K  D 

Y  YDKNIIH  AGKR-YV  R  V 

Y  YDKNVjH  AGKR-YV  R  7 

Y  YDGDMIS  SGKR-FA  K  D 

Y  YQRGIBA  EGQR-LV  Q  K 

Y  YDKNI jT  HGKR-YA  K  D 

?Y  YDKSIIT  HGKR-YA  K 
SY  YDKNIIT  HGKR-YA  K  D 
?Y  YDKNIJT  HGKR-YA  K  D 
?Y  YDGDMic  QGKR-PV  K  V 
?Y  YVKNijK  NGQK-FV  K  V 
3Y  YVKNliK  NGQK-FV  K  7 
3Y  YEKGIIQ  AGER-YV  K  7 
3Y  YEKGIJQ  AGER-YV  K  V 
SY  YDKNii  R  SGQK-FV  K  7 
RH  YKLNijR  PGQR-LL  R  M 

RY  YEKGIIQ  AGER-YV  K  V 

RY  YNKRito  KGKR-FT  K  N 

RY  YQRGliA  DGQR-LV  H  7 

RY  YRVNIIR  QGER-HC  Q  L 

^RRDIIIL  |gGRK-YT  R  G 
IsfgYyYEKGI^Q  JAGER-YV  K  7 


Fig.  2.  Sequence  alignment  of  the  DNA-binding  domain  of  33  members  of  the  ets  family.  The  amino  acid  sequence  of  PU.l  is  listed  at 
the  top  of  the  figure  and  residues  that  are  strictly  conserved  in  the  family  are  enclosed  in  boxes.  The  sequences  were  obtained  from  the  SWISSPROT 
data  base  and  original  citations  for  the  sequences  are  given  in  the  data  base.  Secondary  structural  features  of  the  PU.l  ETS  domain  are  indicated 
above  the  alignment.  Directly  under  the  PU.l  sequence,  the  residues  that  contact  DNA  are  indicated:  5,  base  interaction;  P ,  phosphate  backbone 
interaction;  W,  water-mediated  interaction.  Residues  found  in  the  hydrophobic  core  in  PU.l  and  expected  to  be  located  in  the  hydrophobic  interior 
of  all  ets  proteins  are  shaded.  In  some  cases,  the  sequences  for  ets  proteins  for  two  or  several  species  are  identical,  and  therefore  only  one  sequence 
has  been  listed  to  avoid  duplication. 


tophan  215  to  arginine  results  in  loss  of  DNA  binding  in  ets-1 
(28,  29;  see  Table  III).  Substitutions  in  the  hydrophobic  core 
affect  DNA  binding  probably  because  the  changes  disrupt  the 
tight  globular  structure  of  the  domain.  Residues  174  and  215 
are  doubly  critical  for  DNA  binding  since  they  represent  both 
important  structural  residues  in  the  domain  core  and  actual 
DNA  contact  residues.  In  summary,  residues  in  the  hydropho¬ 
bic  core  are  critical  for  the  formation  of  the  overall  scaffold  for 


ets  recognition. 

Molecular  Scaffold  of  ETS  Domains— To  evaluate  the  con¬ 
servation  of  this  scaffold  within  the  ets  family,  the  a-carbon 
backbones  of  PU.l  (11)  and  fli-1  (12)  domains  were  superim¬ 
posed  utilizing  both  sequence  homology  and  secondary  struc¬ 
ture  similarities.  For  this  purpose,  a  single  model  from  the 
ensemble  of  structures  deposited  in  the  data  bank  was  used  for 
the  NMR-derived  fli-1  structure.  This  scaffold  provides  the 
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Table  III 

Mutations  in  the  DNA-binding  domain  of  et^  family  proteins  that  abolish  DNA  binding 


The  reference  for  each  mutational  substitution  is  given  in  parentheses  with  the  protein  studied. 


ets  Protein 

Residue  in  PU.l” 

Single  mutation 

Multiple  mutations 

PU.l  (11) 

174H,  D 

L  — G 

PU.l  (11) 

178H 

L->G 

ets-1  (29) 

Multiple 

174H,  D,  175H.  177H,  178H 

ets-1  (28) 

185 

K  — P 

ets-1  (28) 

191H 

I— T 

PU.l  (11) 

193H 

W-^G 

ets-1  (28) 

194 

T  — I 

ets-1  (28) 

196 

D— G 

ets-1  (28) 

201H 

F  — L 

PU.l  (11) 

201H 

F^G 

PU.l  (11) 

203H 

F-^G 

ets-1  (28) 

212 

A-^V 

ets-1  (28) 

214 

R^G 

etsl  (28,  29) 

215H,  D 

W-^R 

PU.l  (11) 

215H,  D 

W--*G 

ets-1  (28) 

219D 

K-*X^ 

PU.l  (11) 

219D 

K  — G 

ets-1  (28) 

222D 

K->X" 

ets-1  (28) 

227H 

Y-^C 

fli-l  (12) 

228D 

D  -  H,  Q,  K 

ets-1  (28) 

232D 

R->X* 

fli-l  (12) 

232D 

R-^D,K,  N 

PU.l  (11) 

232D 

R^G 

fli-l  (29) 

234H 

L-*V 

ets-1  (29) 

Multiple 

234H,  235,  236,  237 

fli-l  (12) 

235D 

R  -^K,  D,  N.  E 

PU.l  (11) 

235D 

R^G 

fli-l  (12) 

236D 

Y-^V 

ets-1  (48) 

242 

I  ^  E.  G,  P.  V 

ets-1  (28) 

243H 

I->T 

PU.l  (11) 

245D 

K  — G 

ets-1  (28) 

248 

K^I 

ets-1  (28) 

254H 

F^L 

°  Residue  numbers  of  the  PU.l  sequence  are  given  to  facilitate  direct  comparison  with  the  sequence  alignment  in  Fig.  2;  H  indicates  a  residue 
in  the  hydrophobic  core  of  the  PU.l  domain  and  D  indicates  residues  which  contact  DNA  in  the  PU.l-DNA  complex,  either  directly  or  by 
water-mediated  interactions. 


*  X,  substitution  by  any  amino  acid. 


Fig.  3.  Stereodiagram  of  the  PU.l-ETS  domain  DNA  complex.  The  a-carbon  backbone  for  residues  171-258  is  shown  bound  to  DNA  with 
the  bases  in  the  GGAA  core  in  bold  lines.  The  ETS  module  is  composed  of  three  a-helices  and  a  four-stranded  antiparallel  /3-sheet  enclosing  a 
hydrophobic  core.  There  are  seven  strictly  conserved  residues  in  this  core  (Fig.  2).  Substitution  of  glycine  for  each  of  the  five  core  residues  in  PU.l. 
shown  on  the  model,  abolishes  DNA  binding. 


framework  for  the  three  structural  features  arranged  in  a 
loop-helix-loop  pattern  that  mediate  precise  DNA  binding  by 
the  PU.l  domain.  In  order  to  delineate  the  loop-helix-loop  motif 
in  other  ets  domains  and  to  predict  whether  this  motif  is  the 
paradigm  for  ets  recognition,  we  also  superimposed  the  a-car¬ 
bon  skeleton  of  the  fli-1  domain  onto  the  PU.l  backbone  bound 
to  the  DNA  (Fig.  4).  Since  this  is  one  of  an  ensemble  of  struc¬ 


tures  from  the  NMR  study,  detailed  comparisons  are  not  pos¬ 
sible.  However,  general  comparisons  are  useful  to  establish 
overall  structural  similarities  between  the  two  related  mole¬ 
cules.  Although  the  structure  of  the  fli-l-DNA  complex  was  not 
determined,  it  should  be  noted  that  the  published  structure  of 
the  fli-1  domain  (12)  reflects  a  bound  conformation  since  the 
NMR  experiments  were  conducted  on  a  98-residue  protein  frag- 
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features.  The  DNA  shown  in  the  figure  is  the  oligonucleotide  bound  to  the  PU.l  domain. 


merit  complexed  to  a  16-base  pair  oligonucleotide. 

As  shown  in  Fig.  4,  there  is  close  similarity  in  the  overall 
scaffold  of  the  ETS  domains  but  several  other  features  of  the 
superposition  are  worth  noting.  First,  the  positions  of  the  four 
conserved  residues  that  contact  DNA  are  very  similar  in  PU.l 
and  fli-1.  In  PU.l,  two  conserved  arginines,  232  and  235,  make 
hydrogen  bonds  with  the  bases  GGA  of  the  PU  core  sequence. 
Arg235(NH-2)  forms  a  hydrogen  bond  with  G-8(0-6)  while 
Arg222(NH-l)  makes  hydrogen  bonds  with  two  bases  G-9(0-6) 
and  A-10(N-6)  on  one  strand  and  a  water-mediated  contact 
with  T-23(0-4)  on  the  opposite  strand.  These  arginines  are 
strictly  conserved  in  all  members  of  the  ets  family  and  the  GGA 
sequence  is  the  consensus  DNA  sequence  recognized  by  the  ets 
proteins.  Therefore,  these  interactions  are  expected  to  be  re¬ 
produced  in  all  ets  protein-DNA  complexes.  When  the  fli-1 
domain  is  superimposed  on  PU.l,  the  side  chains  of  conserved 
arginines  232  and  235  in  the  recognition  helix  are  within  hy¬ 
drogen-bonding  distance  of  the  same  bases  in  the  GGAA  core 
sequence  in  the  major  groove.  Substitution  of  these  residues  by 
any  other  amino  acid,  even  closely  related  hydrophilic  amino 
acids  results  in  loss  of  DNA  recogmtion  in  PU.l,  fli-l,  and  other 
ets  proteins  (see  Table  III).  Conserved  lysines,  residues  219  in 
the  loop  (HTH)  and  245  in  the  wing  contact  the  phosphate 
backbone  in  PU.l  and  are  in  a  position  to  make  the  same 
contacts  in  fli-1.  Mutational  substitutions  for  Lys^^®  in  PU.l 
(11)  and  the  equivalents  of  Lys^^®  and  Lys^^^  (see  Table  III)  in 
fli-1  (12)  or  ets-1  (28)  disrupt  DNA  binding,  presumably  due  to 
the  loss  of  the  phosphate  backbone  interactions.  In  fli-1,  the 
equivalents  of  Lys^^^  and  Met^^®  in  PU.l  (from  the  HTH  loop) 
and  residues  248/249  (from  the  wing  loop)  were  identified 
within  4  A  of  DNA  by  intermolecular  NOEs  (12).  Chemical 
mapping  experiments  with  the  murine  ets-1  molecule  sug¬ 
gested  a  similar  pattern  with  a  major  groove  contact  zone  and 
interactions  with  both  adjacent  minor  grooves  (30). 

DNA  Conformation  in  the  PU.l  ETS  Domain-DNA  Com¬ 
plex -The  PU.l  ETS  domain  contacts  DNA  over  a  10-base  pair 
area.  The  DNA  is  bent  by  8°  in  the  complex  but  does  not  deviate 
significantly  from  B-form  DNA  (see  Table  II).  As  can  be  seen  in 
Fig,  4,  the  DNA  is  uniformly  curved  over  the  length  of  the 
16-base  pair  fragment.  There  is  an  average  helical  twist  of  33°, 
with  10,8  base  pairs  per  turn  and  an  average  rise  per  base  pair 
of  3.2  A.  The  minor  groove  is  slightly  enlarged  ( —2-3  A  from  the 
mean)  in  the  GGAA  region  at  the  midpoint  of  the  oligonucleo- 


5  10  15 

1  1  » 

•  •  p— _  * 

S'.  A  A  A  A  A  G  g|  G  G  A  A  G  T  G  G  G  .3’ 

3'.  T  T  T  T  C  C I  C  C  T  T I  C  A  C  C  C  T  -5’ 

•  •  •  • 

I  i  I 

30  25  20 

Fig.  5.  Sequence  of  the  oligonucleotide  bound  to  the  PU.l 
protein  in  the  crystal  structure.  The  GGAA  recogmtion  core  se- 
quence  as  well  as  the  bases  on  the  complementary  strand  are  enclosed 
in  a  box.  The  PU.l  domain  makes  contacts  with  bases  on  both  strands 
within  this  core.  The  dots  designate  seven  phosphates  that  are  neutral¬ 
ized  by  interactions  with  basic  residues.  With  the  exception  of  the 
phosphate  at  base  14,  all  of  these  phosphates  lie  on  one  face  of  the  DNA 
helix. 

tide.  A  “spine”  of  water  molecules,  similar  to  that  observed  in 
the  crystal  structure  of  a  B-DNA  dodecamer  (31),  is  located  in 
the  minor  groove  from  bases  8  to  12.  Binding  of  the  ETS  domain 
induces  a  DNase  I-hypersensitive  site  3'  to  the  C-26  base  in  the 
core  sequence  (30).  This  site  is  probably  exposed  on  the  face  of 
the  DNA  opposite  to  virhere  the  protein  binds  as  a  result  of  the 
expansion  of  the  minor  groove  (Fig.  3). 

The  DNA  bending  that  is  stabilized  by  the  PU.l  domain  may 
serve  as  an  illustration  of  the  hypothesis  of  DNA  bending  by 
phosphate  neutralization.  It  has  been  demonstrated,  by  the 
introduction  of  neutral  methylphosphonate  analogues  in  DNA 
fragments  bearing  polyadenylate  tracts  (32)  that  bending  of  the 
DNA  occurs  when  the  phosphate  charges  are  neutralized  on 
one  face  of  the  DNA  helix,  due  to  repulsion  of  the  remaining 
anionic  phosphates.  It  was  proposed  (32)  that  binding  of  pro¬ 
teins  with  cationic  surfaces  to  DNA  could  also  cause  the  DNA 
double  helix  to  “spontaneously  relax”  toward  the  surface  where 
amino  acids  neutralized  phosphate  anions  through 
formation  of  salt  bridges.  The  PU.l  ETS  domain  makes  neu¬ 
tralizing  contacts  with  phosphate  groups  on  one  face  of  the 
DNA  helix,  involving  consecutive  phosphates  on  either  side  of 
the  major  groove.  The  sites  of  phosphate  neutralization  are 
shown  on  the  DNA  sequence  in  Fig.  5.  On  the  GGAA  strand, 
neutralizing  contacts  with  the  phosphate  backbone  5'  to  the 
core  sequence  are  made  by  Lys^®*  and  Lys^^®  from  the  wing.  On 
the  complementary  strand,  the  phosphate  contacts  are  5  to  the 
core  sequence  as  well  as  with  the  phosphate  backbone  within 
the  core:  Arg^^®,  Lys^'®,  and  Lys^®®  from  the  HTH  loop  and 
Lys^^®  from  helix  a3.  As  predicted  by  the  neutralization  exper- 
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Fig.  6.  Comparison  of  the  HTH  motif  in  PU.l  with  other  proteins  in  the  HTH  superfamily.  Panel  A,  sequence  alignment  of  residues  that 
form  the  HTH  motif  in  PU.l  with  classic  HTH  proteins  434  cro  repressor  (42)  and  A  repressor  (41),  and  with  a+/3-HTH  proteins  CAP  (15),  HNF-3y 
(16),  and  heat  shock  factor  (HSF)  (40).  The  structural  helices  of  the  PU.l  ETS  domain  are  indicated  by  solid  bars  above  the  PU.l  sequence  and  the 
helices  in  the  bacterial  repressors  are  indicated  by  open  bars  below  the  A  repressor  sequence.  This  figure  was  adapted  from  Fig.  1  in  Ref.  43. 
Residues  that  are  shaded  represent  positions  in  classic  HTH  that  are  generally  hydrophobic  or  small  (Gly  or  Ala)  in  these  proteins.  The  glycine 
that  is  conserved  in  the  bacterial  HTH  proteins  is  marked  with  an  asterisk.  Note  that  helix  a2  in  PU.l  is  one  turn  longer  than  the  coxmterpart  in 
the  bacterial  proteins,  yet  when  the  HTH  motifs  of  the  repressors  are  superimposed  on  the  PU.l  HTH,  the  glycine  in  the  last  turn  of  the  PU.l  a2 
helix  is  equivalent  to  the  conserved  glycine  in  the  turn  of  the  bacterial  proteins  (not  shown).  Panel  B,  the  HTH  motifs  of  PU.l  (thick  line),  CAP 
(medium  line;  Ref.  15),  and  heat  shock  factor  (thin  line;  Ref.  40)  are  superimposed  for  comparison.  The  a3  recognition  helix  is  on  the  right  in  the 
photograph.  Note  that  the  relative  orientation  of  the  two  helices  is  closely  similar  in  the  three  molecules,  but  the  configuration  of  the  residues  in 
the  turn  between  the  helices  is  different.  The  turn  in  the  PU.l  domain  is  seven  residues  in  length  which  is  intermediate  between  the  extremes 
reported  for  the  family  of  HTH  proteins  (43,  44). 


iments  (32),  the  cationic  surface  of  the  PU.l  domain  binds  to 
the  DNA  causing  a  bend  of  the  duplex  oligonucleotide  toward 
the  ETS  module  that  is  within  the  range  (^10°)  of  curvature 
estimated  experimentally.  The  bend  is  toward  the  “neutral 
surface,”  i.e.  toward  the  protein.  Two  of  these  phosphate  inter¬ 
actions  in  the  minor  groove  involve  conserved  residues,  Lys^^® 
from  the  HTH  loop  and  Lys^'^®  from  the  wing.  Thus  the  loop- 
helix-loop  pattern  may  influence  both  DNA  recognition  and 
DNA  bending. 

This  type  of  charge  neutralization  is  not  seen  in  all  protein- 
induced  DNA  bends.  For  example,  the  TATA-binding  protein 
binds  with  extensive  phosphate  backbone  interactions  to  the 
TATA  element  (33).  Yet  in  this  case  the  DNA  is  sharply  kinked 
away  from  the  protein  contacts.  In  CAP  (15)  salt  bridges  and 
other  hydrogen  bonds  to  phosphate  groups  stabilize  a  severely 
kinked  DNA  conformation  with  DNA  bent  at  90°. 

Interactions  with  the  phosphate  backbone  are  seen  in  nu¬ 
merous  DNA-binding  proteins,  but  these  contacts  are  often 
hydrogen  bonds  and  not  salt  bridges.  The  hypothesis  (32)  states 
that  neutralization  of  charge  by  lysines  and  arginines  results  in 
excess  repulsive  electrostatic  forces  that  can  maintain  bending 
of  the  DNA  double  helix  (34).  The  moderate  DNA  bending  seen 
in  the  complexes  of  oligonucleotides  with  paired  homeodomains 
(35,  36)  or  HNF-Sy  (16)  may  also  result  from  phosphate  neu¬ 
tralization,  since  these  proteins  form  phosphate-side  chain  salt 
bridges  with  4  or  3  arginines,  respectively.  However,  the  neu¬ 
tralizing  contacts  are  not  as  extensive  as  those  seen  in  the 
PU.l-DNA  complex. 

The  complementarity  of  the  loop-helix-loop  motif  of  fli-1  with 
the  DNA  from  the  PU.l  complex  also  suggests  that,  like  PU.l, 
other  ETS  domains  may  not  significantly  deform  DNA  from 
B-DNA  conformation  but  to  date  there  is  not  much  biochemical 
data  in  the  literature  on  DNA  bending  by  ETS  domains.  In  one 
study  of  the  ETS  domain  from  the  Elk-93  protein,  circular 


permutation  analyses  indicated  that  DNA  binding  by  the 
Elk-93  fragment  did  not  induce  significant  bending  of  DNA 
(37).  In  contrast,  in  the  human  ets-l-DNA  complex  (14),  the 
DNA  was  kinked  at  a  60°  angle  due  to  intercalation  of  a 
tryptophan  side  chain.  The  equivalent  of  this  tryptophan,  ty¬ 
rosine  175  in  PU.l,  is  found  in  the  hydrophobic  core  and  is  not 
in  position  to  intercalate.  Substitution  of  glycine  for  this  t3a*o- 
sine  in  PU.l  does  not  affect  DNA  binding  (11).  In  fli-1  (12),  the 
equivalent  tryptophan  is  buried  in  the  hydrophobic  core  and 
was  not  listed  among  residues  in  close  proximity  ( :24A)  to  DNA, 
Thus,  the  molecular  basis  for  kinked  DNA  cannot  be  under¬ 
stood  in  the  context  of  contacts  seen  in  the  PU.l-DNA  complex 
(11)  or  inferred  in  the  fIi-1  complex  (12).  DNA  bending  by 
phosphate  neutralization  is  not  apparent  in  the  ets-l-DNA 
complex,  since  only  one  lysine  and  one  arginine  form  phos¬ 
phate-side  chain  salt  bridges.  The  arginine  is  the  equivalent  of 
Arg235  PU.l  that  forms  a  hydrogen  bond  with  base  G-8  in  the 
GGA  core. 

Target  Specificity  —  The  superimposed  models  in  Fig.  4  sug¬ 
gest  that  a  loop-helix-loop  scaffold  that  brings  together  con¬ 
served  amino  acids  and  conserved  DNA  bases  is  a  general  mode 
of  DNA  recognition  by  ets  proteins.  Yet,  ets  transcription  fac¬ 
tors  bind  to  the  GGA(AyT)  core  motif  in  the  context  of  specific 
promoters.  To  begin  to  identify  residues  that  influence  target 
specificity,  it  is  necessary  to  look  for  mutations  of  non-con- 
served  residues  that  affect  DNA  binding.  Of  the  14  absolutely 
conserved  residues  in  the  domain,  seven  contact  DNA  in  the 
PU.l  complex.  These  contacts  would  be  expected  to  be  main¬ 
tained  for  all  ets-DNA  complexes.  In  studies  of  a  number  of 
members  of  the  ets  family,  mutations  have  been  reported  that 
affect  DNA  binding.  These  mutations,  summarized  in  Table  III, 
can  now  be  correlated  with  the  atomic  model  of  the  PU.l-DNA 
complex.  Some  of  these  residues  are  conserved  residues,  but 
others  are  unique  to  a  particular  molecule. 
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7.  Comparison  of  protein-DNA 
complexes  in  HTH  proteins.  The  loop- 
henx-loop  pattern  of  DNA  recogn.t.on  in 
the  PL .  1  complex  ipanel  Al  is  compared  to 
a  das.mc  HTH  protein  434  cro  repressor 
A  1  (;,aae/  fii.  to  an  HTH  protein 
Ah’  ( ]  D)  (panel  C  ),  and  to  a  winped-HTH 
protom  HNF-3V  ( 16*  (panel  Di  In  each  of 
■nose  complexes,  the  recognition  helix 
makes  contact  in  the  major  groove.  The 
contacts  o{  the  PU.l  domain  with  DNA 
are  more  extensive  and  include  interac¬ 
tions  trom  two  l(K)ps  in  the  minor  grooves 
on  either  side  of  the  major  f^oove  where 
reco.CTiition  helix  a3  binds. 


the  GCAA  strands  at 

well  asItsTes  r' 

familv  Th  T  sequence  variability  exists  m  the  ets 

S"  wiJh  tha  rPA?  specific  base  con- 

ets  with  the  GGAA  sequence  and  the  bases  on  the  comple- 

-esiH  '’as  been  shown  that  a  single 

GGAT*"  recognition  of  ets  proteins  from  GGAA  to 

■?51n  PU  in!  TV"  -equivalent  to  residue 

tivitv  fo^  rPAA  ^  restricted  selec- 

tuiu  for  GGAA  like  the  Elf-1/E74  proteins  and  the  reverse 

rthe’ru  T  c"^"  T  TT  recognition  (38). 

nd  makes  a  water-mediated  contact  to  base  C-25  on  the  anti- 

■raTe/ in  "the"'  ^ 

cated  in  the  major  groove  at  the  GGAA  site.  Twelve  well 

iboT'^  "  ^  '’'^‘'rogen-bonded  to  the  bases  and 

Tills  w^er  ^  network  between  the  two  strands. 

D^lex  Ind  c  T  the  stability  of  the  du- 

Plex  and  consequently  influence  specific  DNA  recognition. 


Since  the  side  chain  of  lysine  is  long,  it  is  possible  that  the 
contact  of  a  shorter  residue  such  as  threonine  would  not  bind  to 
this  water  network  and  could  contact  a  different  base,  i.e  T-23 

he  water  network  itself  could  also  change.  Or.  the  interchange 
of  lysine-threonine  could  permit  DNA  contact  reflecting  the 
stereochemical  difference  in  size  of  adenine  oersus  th^iSe 

the™  I^DNa''^^^  direct  contacts  with  specific  bases  in 
the  PL  .I-DNA  complex  are  made  by  residues  in  the  o3  recog¬ 
nition  hehx.  Two  non-conserved  residues.  Thr--®  and  Gln^^s  ® 
the  ammo-terminal  end  of  this  helix,  make  water-mediated 
contacts  with  bases  C-25  and  C-26,  respectively,  that  are  base 
paired  to  ^anines  8  and  9  in  the  core  GGAA  sequence.  Both  of 
hese  residues  are  unique  to  PU.l/SpiB  in  the  ets  familv,  so 
these  may  represent  PU.  1-specific  contacts. 

^r"'u  '  j  conserved  in  the  ets  familv,  is  located 

in  the  hydrophobic  interior  of  the  protein.  While  the  phenvl 
ring  of  this  tyrosine  is  buried,  the  hydroxvl  group  is  exposed 
and  l».  .rtthln  3.6  A  of  G.»OfP,.  This  ros.du.  was 
L  u  e  in  our  ist  of  DNA  contacts  using  a  conservative  cut-off 
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of  3.2  Aifor  hydrogen  bonds/ionic  interactions.  Although  this 
interaction  may  not  occur  in  PU.L  with  a  simple  side  chain 
rotation,  a  hydrogen  bond  is  possible  with  the  phosphate  back¬ 
bone.  This  may  be  an  example  of  a  contact  made  by  a  conserved 
residue  that  influences  DNA  recognition  by  selected  family 
members.  Substitution  of  cysteine  for  this  tyrosine  abolishes 
DNA  binding  in  ets-1  (28). 

In  Fig.  6A.  the  sequence  of  the  HTH  motif  of  PU.l  is  com¬ 
pared  with  the  sequence  of  “classic"  bacterial  HTH  proteins 
and  other  winged-HTH  proteins.  The  glycine  required  in  the 
turn  between  helices  in  HTH  proteins  (39)  is  also  conserved  in 
this  position  in  ETS  domains,  although  the  a2  helix  is  one  turn 
longer  than  the  helix  in  HTH  proteins.  In  PU.l,  the  glycine  lies 
in  the  last  turn  of  this  helix.  This  glycine  and  other  hydropho¬ 
bic  residues  in  a2  and  a3  stabilize  the  arrangement  of  these 
two  helices  in  HTH  proteins.  Even  this  pattern  of  conserved 
hydrophobic  residues  is  seen  in  ets  proteins.  In  other  winged- 
HTH  proteins,  HNF-3y  (16)  or  heat  shock  factor  (40),  the  se¬ 
quence  similarities  are  not  as  apparent.  These  two  proteins 
have  prolines  in  the  equivalent  position  of  the  conserved  gly¬ 
cine  and  the  presence  of  this  proline  may  influence  the  config¬ 
uration  in  the  “turn."  On  the  other  hand,  ets  proteins  may 
c?xhibit  a  helical  arrangement  that  is  structurally  closer  to  that 
in  “classic"  HTH  proteins.  When  HTH  elements  of  PU.l  and 
HTH  molecules  such  as  A  (41)  or  434  cro  (42)  repressors  are 
superimposed,  the  glycine  is  in  a  structurally  equivalent  posi¬ 
tion  (not  shown).  Moreover,  the  overall  pattern  of  docking  of  the 
recognition  helix  in  the  major  groove  is  quite  similar  when  434 
cro  repressor  (42),  CAP  (15),  and  PU.l  are  compared  bound  to 
DNA  (Fig.  7).  The  major  difference  is  the  fact  that  the  recog¬ 
nition  helix  in  PU.l  docks  deep  in  the  major  groove  with  con¬ 
tacts  to  the  bases  involving  residues  along  the  entire  length  of 
the  helix,  while  DNA  contacts  in  CAP  and  other  classic  HTH 
proteins  are  made  from  residues  at  the  amino-terminal  portion 
of  the  helix. 

None  of  the  related  proteins  in  the  HTH  superfamily  actually 
contact  DNA  by  residues  in  the  HTH  turn  (43,  44).  This  novel 
DNA  contact  may  be  possible  in  PU.l,  as  well  as  other  ets 
proteins,  because  the  connecting  segment  between  helices  is 
more  of  a  loop  than  a  turn.  The  corresponding  HTH  motifs  of 
heat  shock  factor  (40)  and  CAP  (15)  are  compared  to  PU.l  in 
Fig.  66.  But  it  is  not  simply  the  length  of  the  “turn”  or  “loop”  in 
the  HTH  motif  that  accounts  for  this  DNA  contact  in  PU.l, 
since  other  eukaryotic  HTH  proteins  contain  even  longer  con¬ 
necting  segments  (43,  44)  and  yet  do  not  contact  DNA  by  this 
structural  feature,  for  example  HNF-Sy  (16).  Thus  the  contacts 
made  by  this  loop  in  PU.l  illustrate  a  new  DNA  contact  that,  to 
date,  is  unique  to  the  ets  proteins  as  the  newest  members  of  the 
HTH  superfamiiy. 

Loops  and  Minor  Groove  Con ^ac^s  — Since  the  sequences  in 
the  HTH  loop  as  well  as  the  loop  (wing)  between  strands  /33  and 
p4  are  not  strictly  conserved  among  members  of  the  ets  family, 
these  residues  may  be  important  sites  for  specific  recognition 
by  individual  members  of  the  family.  In  the  PU.l-DNA  com¬ 
plex,  these  two  loops  contact  the  minor  groove  through  inter¬ 
actions  with  the  phosphate  backbone  closest  to  the  major 
groove.  It  is  also  interesting  to  note  that  the  length  of  both  of 
the  contact  loops  differs  among  members  of  the  family,  with  the 
PU.l  loop  containing  an  “extra”  glycine  at  residue  220  and 
lacking  a  glycine  after  residue  247.  Other  residues  in  these 
loops  may  also  provide  specific  contacts  to  bases  in  other  ets 
proteins.  For  example,  the  change  of  arginine— ^aspartic  acid 
(equivalent  to  244  in  PU.l)  affects  DNA  binding  in  Elk-1  (45). 

Since  ets  proteins  bind  DNA  as  monomers,  it  could  be  ex¬ 
pected  that  there  would  be  extensive  contacts  to  stabilize  the 
interaction.  HNF-3y  also  binds  DNA  as  a  monomer  (16).  In  the 


HNF-3y  complex,  three  regions  were  involved  in  DNA  recogni¬ 
tion:  the  recognition  helix  and  two  wings.  The  location  of  the 
first  wing  between  the  last  two  strands  in  the  /3-sheet  corre¬ 
sponds  topologically  to  the  wing  in  PU.l,  but  contacts  from  the 
second  wing  emanate  from  a  loop  at  the  COOH  terminus  of  the 
domain.  The  structural  equivalent  of  this  second  loop  is  absent 
in  PU.l.  In  CAP.  the  major  DNA  contacts  are  made  from  the 
i-ecognition  helix.  This  protein  binds  DNA  as  a  dimer.  The 
surface  area  on  CAP  that  is  buried  on  DNA  binding  is  1187  A'^. 
Similarly,  the  surface  area  buried  when  434  cro  repressor  binds 
DNA  is  1306  A“.  But  with  the  formation  of  the  DNA  complex 
with  the  PU.l  ETS  domain.  1701  A^  surface  area  is  buried.  The 
significantly  greater  surface  area  of  the  PU.l  domain  covered 
reflects  the  extensive  protein-DNA  contact  region  extending  for 
more  than  30  A  (11). 

The  PU.l-DNA  model  suggests  that  residues  from  the  two 
loops  contribute  the  critical  interactions  for  recognition  of  bases 
other  than  the  conserved  GGAA  core  when  the  core  is  embed¬ 
ded  in  specific  promoter  sequences.  The  loops  approach  seg¬ 
ments  of  the  DNA  that  are  adjacent  to  the  conserved  core 
sequence  and  therefore  these  interfaces  are  stereochemically 
suitable  to  permit  sequence-specific  interactions  by  a  given 
family  member  while  maintaining  the  consensus  interactions 
at  GGA(A/T).  Moreover,  the  contacts  from  these  loops  may 
mediate  specific  base  interactions  by  stabilizing  a  bend  toward 
the  protein.  Future  extensive  mutational  studies  of  amino  acids 
that  contact  DNA  are  needed  to  identify  these  residues.  Ulti¬ 
mately,  crystal  structures  of  other  ets  proteins  complexed  to 
DNA  can  be  compared  to  distinguish  unique  DNA  contacts. 
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The  Ets  family  of  transcription  factors,  of  which  there  are  now 
about  35  members^’%  regulate  gene  expression  during  growth  and 
development.  They  share  a  conserved  domain  of  around  85  amino 
acids'^  which  binds  as  a  monomer  to  the  DNA  sequence  5'-C/ 
AGGAA/T-3'.  We  have  determined  the  crystal  structure  of  an  ETS 
domain  complexed  with  DNA,  at  2.3-A  resolution.  The  domain  is 
similar  to  a  +  p  (winged)  ‘helix-turn-helix’  proteins  and  inter¬ 
acts  with  a  ten-base-pair  region  of  duplex  DNA  which  takes  up  a 
uniform  curve  of  8  .  The  domain  contacts  the  DNA  by  a  novel 
ioop-helix-Ioop  architecture.  Four  of  the  amino  acids  that 
directly  interact  with  the  DNA  are  highly  conserved:  two  arginines 
from  the  recognition  helix  lying  in  the  major  groove,  one  lysine 
from  the  ‘wing’  that  binds  upstream  of  the  core  GGAA  sequence, 
and  another  lysine,  from  the  ‘turn’  of  the  ‘helix-turn-helix’ 
motif,  which  binds  downstream  and  on  the  opposite  strand. 

The  PU.l  [Spi-L  Spfi~I]  transcription  factor  is  an  Ets  protein 
expressed  in  haematopoietic  cells^  \  PU.l  is  a  regulatory  protein 
for  differentiation  of  monocytes  and  macrophages  and  for  B-cell 
maturation  (reviewed  in  ref.  2).  The  ETS  domain  of  PU.l  wasco- 
crystallized  with  a  16  base-pair  oligonucleotide  containing  the 
recognition  sequencc^  The  structure  was  solved  by  the  multiple 
isomorphous  replacement  and  anomalous  scattering  (MIRAS) 
method  (Table  l).The  electron  density  was  clearly  defined  (Fig.  1) 
for  residues  171  to  258,  which  encompasses  the  entire  conserved 
ETS  domain.  The  PU.  1  domain  assumes  a  tight  globular  structure 
(33  X  34  X  38  A')  formed  by  three  o(-heliccs  and  a  four-stranded 
antiparallel  P-sheet  (Fig.  1 ).  The  domain  topology'  is  similar  to  the 
structures  of  other  Ets  family  proteins  Fli-1  (ref.  7),  murine  Ets- 1 
(ref.  8)  and  human  Ets-1  (ref.  9)  determined  in  solution  by  NMR. 
The  structural  studies  revealed  a  common  folding  pattern  for  ETS 
domains  that  is  similar  to  +  p  helix-turn-hclix  (HTH)  DNA- 
binding  proteins  including  CAP'"  and  resembles  ‘winged’  FITH 
proteins  such  as  GH5  (ref.  1 1 ).  HNF-37  (ref.  12)  and  HSF  (ref.  13). 
There  are  three  sites  of  protcin-DNA  contact:  the  recognition 
helix  (0(3),  the  loop  between  p-strands  3  and  4  (a  ‘wing')  and  the 
turn  in  the  FITH  motif  (a2-turn-a3).  The  turn  between  a2  and  o(3 
is  longer  than  the  equivalent  in  many  other  HTH  protein.s,  and  is 
actually  a  loop.  The  DNA-binding  motif  in  PU.l,  and  probably 
other  members  of  the  Ets  family,  can  be  described  more  appro¬ 
priately  as  a  Ioop-helix-Ioop  motif.  Therefore  the  large  Ets 
family  defines  a  new  variant  subclass  of  the  helix-turn-helix 
DNA-binding  proteins  with  a  novel  mode  of  DNA  recognition. 

The  protein-DN A  contacts  in  the  PU.  1  complex  are  detailed  in 
Fig.  2.  Four  strictly  con.served  residues  on  the  surface  of  the 
domain  are  likely  to  be  important  for  DNA  binding  by  all 
members  of  the  Ets  family.  Arg232  and  Arg235  emanate  from 
helix  0(3  and  contact  bases  in  the  GGAA  sequence  in  the  major 
groove.  These  contacts  represent  the  core  structure  for  DNA 


t  Present  address:  Deoartment  of  Microbiology  and  Immunology,  Indiana  University  School  of  Medicine, 
Indianapolis.  Indiana  46202-5120.  USA, 
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recognition  by  members  of  the  Ets  family  because  they  involve 
both  strictly  conserved  amino  acids  and  bases  in  the  consensus 
sequence  recognized  by  these  transcription  factors  (see  Fig.  3^). 
The  equivalent  arginines  81  and  84  in  Ets-1  (ref.  9)  do  not  contact 
the  GGAA  bases,  but  intermolecular  nuclear  Overhauser  effects 
between  these  arginines  and  DNA  were  observed  in  the  Fli-1 
NMR  studies^  Lys245  extends  from  P3  just  adjacent  to  the  loop 
(‘wing’),  and  Lys219  is  located  in  the  ‘loop’  of  the  HTH  motif. 
Lys245  contacts  the  phosphate  backbone  of  the  GGAA  strand  in 
the  minor  groove  upstream  from  the  core  sequence  (Fig.  3c)  and 
Lys219  forms  a  salt  bridge  with  the  phosphate  backbone  of  the 
opposite  strand  downstream  of  the  GGAA  core  (Fig.  ?>d).  Sub¬ 
stitutions  of  glycine  at  each  of  these  four  conserved  sites  abolished 
DNA  binding,  confirming  the  functional  importance  of  these 
contacts  (see  Fig.  2). 

Mutations  of  conserved  residues  that  contact  the  phosphate 
backbone  also  affect  DNA  binding.  Substitution  of  glycine  at 
Leu  174  or  Trp215  abolished  DNA  binding  in  PU.l.  Similarly, 
substitution  of  any  amino  acid  in  Ets-1  (ref.  14)  at  the  equivalent 
of  PU.l  residues  Lys219  and  Arg222  that  bind  the  phosphate 


backbone  disrupted  DNA  binding.  These  minor-groove  contacts 
might  represent  a  conserved  pattern  for  protein  ‘docking’  in  the 
Ets  family.  In  Fli-1  (ref.  7),  the  equivalents  of  Leu  174,  Lys  219  and 
Lys  222  showed  large  chemical  shifts  on  DNA  binding  in  the  NMR 
studies  (the  counterpart  of  Trp215  was  buried). 

Water  molecules  also  participate  in  protein-DNA  recognition 
in  the  PU.l  complex  (Fig.  2).  There  are  27  well-ordered  solvent 
molecules  around  the  DNA.  Solvent  molecules  in  the  major 
groove  are  hydrogen-bonded  to  the  bases  and  also  form  a  hydro¬ 
gen-bonded  network  between  the  two  strands  that  might  contri¬ 
bute  to  the  stability  of  the  duplex  and  consequently  influence 
specific  DNA  recognition.  Conserved  Arg232  and  Arg235  each 
form  direct  and  water-mediated  contacts  with  the  bases.  Three 
other  residues  also  contact  DNA  bases  through  water  molecules: 
Thr  226,  Gin  228  and  Asn  236.  These  residues  are  not  conserved  in 
the  Ets  family  and  might  represent  interactions  that  are  unique  to 
the  PU.l  protein.  Thr  226  and  Gin  228,  at  the  amino-terminal  end 
of  helix  a3,  make  water-mediated  contacts  with  bases  C25  and  C26 
respectively  that  are  base-paired  to  guanines  8  and  9  in  the  core 
sequence. 


TABLE  1 

Structure  determination  and  refinement 

Phasing  statisfics 

Native 

Hg 

/(29) 

/(13) 

/(31) 

Resolution  (A) 

2.3 

3.0 

2.9 

3.0 

2.8 

Observed  reflections 

60,095 

25,081 

20,709 

20,512 

23,308 

Unique  reflections 

20,105 

14,902 

13,258 

12,910 

15,397 

Completeness  (%) 

97 

79 

65 

69 

68 

fisyn,  (%)• 

R,^  (%)  to  3.0At 

5.0 

3.6 

4.0 

4.3 

3.6 

13.0 

14.4 

15.9 

13.0 

Number  of  sites 

For  Isomorphous  data  (//ff  ^  3) 

2 

2 

2 

2 

Phasing  powerj 

1.33 

1.76 

1.04 

0.98 

To  resolution  (A) 

3.0 

3.0 

3.0 

3.0 

^Cullis§ 

For  anomalous  data  {//^  ^  3) 

0.62 

0.57 

0.68 

0.67 

Phasing  power^ 

1.0 

1.41 

1.13 

1.43 

To  resolution  (A) 

Mean  figure  of  merit  (10-3.0  A)  is  0.65. 

Refinement  statistics 

Resolution  range 

Average  6  (A^) 

Crystallographic  R-factor  (%) 

R,ree  (%)^" 

Number  of  reflections  used 

Number  of  protein  atoms 

Number  of  DNA  atoms 

Number  of  solvent  atoms 

3.0 

8-2.3A 

20.1 

23.7 

29.9 

16,898  F  >  3fT(Fj 

1,486 

1,300 

88 

3.0 

3.0 

3.0 

The  crystallization  of  the  PU.l  ETS  domain  (residues  160-272)  with  a  16-bp  synthetic  DNA  oligonucleotide  containing  the  recognition  sequence  was 
described  previously^.  Crystals  formed  in  the  space  group  C2  with  a  =  89.1,  b  =  101.9,  c  =  55.6  A  and  p  —  111,2 ',  with  two  complexes  in  the  asymmetric 
unit.  Phase  determination.  Four  heavy-atom  derivatives  were  prepared  by  soaking  crystals  of  the  native  complex  and  by  co-crystalllzing  iodinated 
oligonucleotides  with  the  PU.l  domain.  The  locations  of  the  iodinated  bases  are  indicated  in  Fig.  2.  Multiple  isomorphous  replacement  phases,  Including 
anomalous  data,  were  calculated.  The  package  PHASES^^  was  used  to  refine  heavy-atom  positions,  6-factor/occupancies  and  to  calculate  phases  to  3  0-A 
resolution  with  an  overall  figure  of  merit  of  0.65.  The  initial  MIRAS  map  (3.0  A)  was  improved  by  solvent  flattening  by  the  method  of  Wang^«  and  with  non¬ 
crystal  lographic  density  averaging.  Model  building  and  refinement.  The  improved  MIRAS  electron-density  map  was  used  to  build  the  model  with  the 
interactive  graphics  programs  TOM  based  on  FRODO^^  and  The  density  forthe  DNA  helix  was  a  prominent  feature  of  the  map.  To  fit  the  DNA,  an  ‘ideal'  B- 
DN  A  duplex  was  generated  with  the  program  QUANTA  (Molecular  Simulations,  Inc.)  and  fitted  to  the  density  as  a  rigid  body.  After  the  DNA  was  positioned,  a 
polyalanine  chain  was  constructed  with  the  BONES  option  of  the  Alberta/Caltech  program  TOM.  Subsequently  side  chains  for  all  residues  with  clear  electron 
density  were  added  to  the  model.  There  were  11  disordered  residues  at  the  N  terminus  of  the  domain  and  14  disordered  residues  at  the  C  terminus  so  these 
ammo  acids  were  not  included  in  the  model.  For  all  other  residues  representing  the  complete  ETS  domain,  the  electron  density  was  clear  (see  Fig.  1)  and 
allowed  unambiguous  fitting  of  both  backbone  and  side-chain  atoms.  Manual  adjustments  of  individual  DNA  bases  were  made  to  fit  the  electron  density.  In 
the  program  X-PLOR  ,  the  stereochemistry  of  the  protein  was  optimized  to  bond  and  angle  parameters  developed  by  Engh  and  Huber^^  and  for  DNA  by  using 
parameters  of  Parkinson  eta/.^^.  Weak  restraints  were  placed  on  all  ribose  conformations.  One  cycleof  simulated  annealing  at  3,000  K  (ref.  24)  was  followed 
by  cycles  of  manual  model  building,  positional  refinement  and  S-factor  refinement.  More  data  were  added  as  the  refinement  progressed  in  increments:  3, 
2.8, 2.6  and  2.3  A.  A  total  of  88  solvent  oxygens  ((6)  =  22  A^)  have  been  added  to  the  model  at  this  stage  of  the  refinement.  Main-chain  torsion  angles  for  all 
non-glycine  residues  fall  within  energetically  favourable  Ramachandran  boundaries'^.  The  r.m.s.  difference  for  84  a-carbon  atoms  in  the  two  complexes  In  the 
asymmetric  unit  is  0.35  A. 

t/^iso  is  E  I|Fph!  -\F  pI  /  E|F  p|,  where  |Fp|  and  IFp^j  are  structure  factors  for  the  protein  and  derivative,  respectively. 

4  Phasing  power  is  the  r.m.s.  value  of  |Fh|  /F,  where  E  is  residual  lack  of  closure. 

§^Cullis  ~  eii^^phI  ±|Fp|  -  |FH,caic,ti/i:i  FpH  -  Fp|  for  centric  reflections,  where  FH(caic^  is  the  calculated  heavy-atom  structure  factor. 
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FiG.  1  Overall  structure  of  the  PU.l-DNA  complex,  a.  Stereoviev^/  of  the  refined  2.3-A  (1.5 rr) 
2  F, ;  -  !  F,  electron-density  map  at  the  protein-DNA  interface.  Interactions  of  protein  (gold),  DNA 
(red)  and  water  (white)  in  the  major  groove  at  the  GGAA  core  recognition  sequence  are  shown.  The  two 
strictly  conserved  residues,  Arg232  and  Arg235  from  recognition  helix  y.3,  make  direct  contact  with 
bases  in  the  major  groove.  There  is  a  tight  network  of  water  molecules  at  this  site  in  the  major  groove  (Fig. 
2'.  b,  Ribbon  drawing  of  the  PU.l  ETS  domain.  The  module  (residues  171-258)  is  composed  of  three  7- 
helices  and  a  four-stranded  antiparallel  l^sheet.  In  the  interior  of  the  domain,  a  hydrophobic  core  is 
formed  with  19  side  chains  including  seven  strictly  and  eight  highly  conserved  residues.  The  major 
structural  features  that  contact  the  DNA  are  indicated:  the  recognition  helix  73  (h),  the  turn  in  the  HTFI 
motif  (t)  and  the  loop  between  (-(-strands  3  and  4  (w)  corresponding  to  the  ‘wing’  in  these  proteins.  At  the 
N-terminal  end  of  the  fragment,  helix  7.1  begins  at  residue  172.  The  C-terminal  segment,  which  is 
disordered  in  the  PU.l-  DNA  complex,  assumes  an  7-helical  conformation  in  the  unbound  Ets-1  NMR 
structure''.  This  segment  might  unfold  in  PU.l  with  DNA  binding,  c,  Space-filling  model  of  the  PU.l  ETS 
domain-DNA  complex.  Protein-DNA  interactions  include  both  major  and  minor  groove  contacts  over  a 
distance  of  30  A.  The  PU.l  transcription  factor  (gold)  binds  to  DNA  as  a  monomer,  so  it  is  not  surprising 
that  extensive  DNA  contact  sites  exist  in  addition  to  the  recognition  sequence  to  stabilize  binding.  HNF-37 
(ref.  12)  and  GH5  (ref.  11)  also  bind  to  target  DNA  as  monomers.  In  the  HNF-37- DNA  complex,  three 
regions  were  involved  in  DNA  recognition:  the  recognition  helix  and  two  ‘wings’.  The  location  of  the  first 
'wing'  between  the  last  two  strands  in  the  (Usheet  corresponds  topologically  to  the  ’wing’  in  PU.l.  but 
contacts  from  the  second  ‘wing’  emanate  from  a  loop  at  the  C  terminus  of  the  domain.  The  structural 
equivalent  to  the  second  ‘wing’  is  absent  in  PU.l. 


FIG.  3  PU.l  DNA  complex,  a.  The  16-bp  oligonucleotide 
bound  in  complex  to  the  PU.l  ETS  domain  is  shown  in  grey, 
with  the  GGAA  sequence  coloured  red.  The  ETS  domain  is 
represented  by  an  orange  ribbon  model  with  the  side  chains 
for  four  conserved  residues  that  contact  DNA  shown.  When 
glycine  was  introduced  at  each  of  these  sites.  DNA  binding 
was  lost  (Fig.  2).  b.  Detailed  close-up  view,  showing  that 
Arg232  and  Arg235  from  the  recognition  helix  make  hydro¬ 
gen  bonds  with  the  bases  GGA  of  the  PU  core  sequence. 
Arg235(NH2)  forms  a  hydrogen  bond  with  G8(06).  whereas 
Arg232(NHl}  makes  hydrogen  bonds  with  two  bases 
G9(06)  and  A10(N6).  These  arginines  are  strictly  consen/ed 
in  all  members  of  the  Ets  family,  and  the  GGA  sequence  is  the 
consensus  DNA  sequence  recognized  by  the  Ets  proteins. 
Therefore  the  interactions  shown  here  represent  the  para¬ 
digm  for  Ets  recognition,  which  is  expected  to  be  reproduced 
in  all  Ets  protein  - DNA  complexes,  c.  Interaction  of  the  ‘wing’: 
Lys  245(NZ)  contacts  the  phosphate  backbone  at  G6(02P). 
cf.  Interaction  of  Lys219(NZ)  from  the  loop  in  the  HTH  motif, 
which  contacts  the  phosphate  backbone  at  C22(03P)  and 
T23(02P).  This  figure  was  generated  with  the  graphics 
program  QUANTA  (Molecular  Simulations.  Inc.). 
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PU . 1  MOUSE 
ETS-1  HUMAN 


HKEAIJU!R|&QH3NRKKRlfcKMAHALHNfcKTGEVKB|vK-KKL-TYQ 
DPDEVA^BcpB-NKPKlNBEKLSHGLSilfirDKNIIHllTAGKR-YVYR 
o2  ^3  p3  ^ 


FIG.  2  Protein-DNA  contacts  in  the  PU.l-DNA  complex,  a,  Backbone  of 
the  PU.l  ETS-domain-DNA  complex.  The  DNA  Is  bent  by  8°  from  canonical 
B-DNA  structure  and  curved  nearly  uniformly  along  the  entire  16  bp. 
Analysis  of  the  DNA  structure^^-^®  demonstrated  an  average  helical  twist 
of  33%  an  average  rise  per  base  pair  of  3.2  A  and  10.8  bp  per  turn.  The 
minor  groove  is  slightly  enlarged  2-3  A  from  the  mean)  in  the  GGAA 
(bold)  region  at  the  midpoint  of  the  oligonucleotide.  In  the  Ets-l-DNA 
complex®,  a  60°  kink  is  Induced  between  base  pairs  6  and  7  by  intercalation 
of  the  side  chain  of  Trp28.  The  equivalent  of  this  tryptophan,  Tyr  175  in 
PU.l,  shown  in  the  model,  is  located  in  the  hydrophobic  core,  excluding  the 
possibility  for  intercalation  with  the  DNA  bases.  Substitution  of  glycine  for 
this  tyrosine  did  not  affect  DNA  binding.  Furthermore  the  site  of  intercalation 
in  the  Ets-l-DNA  complex,  base  pairs  6  and  7,  is  located  at  the  opposite 
extreme  of  the  DNA  duplex,  upstream  of  the  GGAA  core  sequence,  b. 
Sequence  of  the  oligonucleotide  bound  to  the  PU.l  protein  (GGAA  PU  box  in 


bold  lines).  Residues  that  contact  the  DNA  through  main-chain  atoms  are 
underlined.  Well-defined  solvent  molecules  located  within  3.2  A  of  protein 
or  DNA  atoms  are  identified  by  an  encircled  W.  Contacts  from  residues  of  the 
‘wing’  are  made  with  the  nucleotides  upstream  of  the  GGAA  sequence,  and 
residues  from  the  loop  in  the  HTH  motif  interact  with  the  opposite  strand, 
downstream  of  the  GGAA  site.  The  direction  of  the  DNA  was  confirmed  by 
the  location  of  the  three  iodinated  bases  (13,29,31;  black  dots)  used  for 
phase  calculation.  Seven  of  the  residues  that  contact  DNA  are  strictly 
conserved  and  four  others  are  highly  conserved,  c,  Sequence  alignment  of 


the  PU.l  and  Ets-1  ETS  domains,  representing  extremes  of  evolutionary 
divergence  in  the  family.  Residues  strictly  conserved  in  all  Ets  proteins  are 
shown  in  black  boxes;  dashes  indicate  gaps  within  the  family.  Numbering 
and  secondary  structural  features  of  the  PU.l  domain  are  indicated.  The 
results  of  mutational  analysis  when  glycine  was  substituted  for  a  residue  are 
also  shown.  The  effects  of  the  interchanges  are  labelled  +  or  ~  above  the 
sequence,  indicating  that  DNA  binding  was  retained  or  abolished.  Muta¬ 
tions  were  generated  essentially  as  described^% 


The  turn  in  the  HTH  motif  is  actually  a  loop,  and  because  the 
sequences  in  this  loop  as  well  as  the  loop  (‘wing’)  between  strands 
P3  and  |34  are  not  strictly  conserved  among  members  of  the  Ets 
family,  these  residues  might  be  important  sites  for  specific  recog¬ 
nition  by  individual  members  of  the  family.  In  fact,  the  lengths  of 
both  of  the  contact  loops  differ  among  members  of  the  family,  with 
the  PU.l  loop  containing  an  ‘extra’  glycine  at  residue  220  and 
lacking  a  glycine  after  residue  247.  Such  conformational  differ¬ 
ences  are  expected  between  family  members,  but  the  contrast 
between  the  PU.l  and  Ets-1  complexes  was  unexpected.  The 
striking  distinction  in  the  mode  of  DNA  contact  by  the  PU.l 
and  Ets-1  domain  could  reflect  extreme  evolutionary  divergence 
between  members  of  the  Ets  family.  Alternatively,  it  should  be 
noted  that  the  Ets-1— DNA  complex  was  formed  under  denaturing 
conditions^’^^  and  it  is  possible  that  the  Trp  intercalation  occurred 
early  during  the  renaturation  step  with  subsequent  protein 
refolding. 
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Future  extensive  mutational  studies  of  amino  acids  that  contact 
DNA  in  Ets  proteins  are  needed  to  identify  residues  that  mediate 
recognition  of  a  specific  DNA  sequence  by  a  given  family  member. 
Ultimately,  crystal  structures  of  other  Ets  proteins  complexed  to 
DNA  must  be  compared  to  distinguish  unique  DNA  contacts.  □ 
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